Email Provider Refactor — PR 1 Plan
Status: Draft for review Author: Claude Code (with Alfred) Date: 2026-05-01 Goal of this doc: Get sign-off on the refactor scope and safety strategy before any code is written.
1. Goal
Replace the ~16 places that currently call the Gmail API (or SMTP) directly with a single sendEmail() entry point backed by a swappable provider interface. After this PR, adding SMTP, Resend, or SES becomes a one-file change instead of a 16-file change.
This PR ships zero behavioural change. Every existing email path continues to send via Gmail OAuth using the exact same credentials read from training_provider. The point of the PR is to make the next PR (SMTP plugin) safe.
2. Non-goals (explicitly out of scope for PR 1)
- Adding SMTP, Resend, SES, or any new transport. That is PR 2.
- New
email_providercolumn ontraining_provider. Added in PR 2. - UI changes in Company Settings. Added in PR 2.
- Refactoring the Google Calendar / Drive / Slides paths. They share
googleapisbut are not email and stay untouched. - Changing email templates, content, or DB schema for templates.
- Encrypting credentials at rest. Worthwhile but separate.
If someone reading PR 1 sees any of the above, the PR is wrong.
3. Inventory of current email sends
(Full table in agent output; summary here.)
| Transport | Sites | Notes |
|---|---|---|
| Gmail API (OAuth) | 15 | OTP login, password reset, certificates (3 paths), course confirmation, completion, courseware, proforma invoice, trainer invitation (3 entry points), feedback, two test endpoints |
| SMTP (nodemailer) | 1 | Support ticket notifications only — lib/services/emailService.ts consumed by pages/api/tickets/create.ts |
| Total | 16 |
Key observations:
- No shared Gmail-send helper exists today — each endpoint manually builds MIME headers and calls
gmail.users.messages.send(). The only shared wrapper is the privatesendGmail()insidelib/trainerInvitationSender.ts. This means PR 1 must refactor every call site, not just one helper. - Credentials are loaded inconsistently: most endpoints inline
SELECT ... FROM training_provider, only trainer-invitation flows use a helper (loadTrainingProviderEmailConfig). PR 1 standardises on the helper. - Two existing admin test endpoints (
send-test-email.ts,send-test-certificate-email.ts) are perfect verification harnesses — they already exist in production, are accessible via Company Settings UI, and let us validate the new abstraction without touching real-user flows.
4. Target architecture
lib/email/
index.ts ← export sendEmail(message): Promise<EmailResult>
types.ts ← EmailMessage, EmailResult, EmailProvider, EmailAttachment
resolver.ts ← getProvider(): always returns gmail-oauth in PR 1
loadConfig.ts ← single source for reading training_provider email columns
providers/
gmail-oauth.ts ← extracted from existing inline code
templates/
(no change — templates stay in DB or existing locations)
Provider interface (final form, PR 1 implements only Gmail):
export interface EmailMessage {
to: string | string[];
cc?: string | string[];
bcc?: string | string[];
replyTo?: string;
subject: string;
html: string;
text?: string;
attachments?: Array<{
filename: string;
content: Buffer;
contentType?: string;
}>;
}
export interface EmailResult {
messageId: string;
provider: 'gmail_oauth'; // union grows in PR 2: 'gmail_oauth' | 'smtp' | 'resend'
acceptedAt: Date;
}
export interface EmailProvider {
send(message: EmailMessage): Promise<EmailResult>;
}
sendEmail() entry point:
// lib/email/index.ts
export async function sendEmail(message: EmailMessage): Promise<EmailResult> {
const provider = await getProvider(); // PR 1: always GmailOAuthProvider
return provider.send(message);
}
What every existing call site looks like after the refactor:
// before
const oauth2Client = new google.auth.OAuth2(...);
oauth2Client.setCredentials({ refresh_token });
const gmail = google.gmail({ version: 'v1', auth: oauth2Client });
const raw = Buffer.from(`Subject: ${subject}\n...${html}`).toString('base64url');
await gmail.users.messages.send({ userId: 'me', requestBody: { raw } });
// after
await sendEmail({ to, subject, html });
The gmail-oauth.ts provider absorbs all the OAuth/MIME plumbing once. No call site does its own MIME construction after PR 1.
5. Migration strategy — how we keep main safe
This is the critical part. The codebase is in active production use; an OTP-send regression locks every user out. We mitigate with strangler-style incremental migration within a single PR:
Phase 1 — Land the abstraction unused (low risk)
- Add
lib/email/*files. - Add
gmail-oauth.tsprovider implementing the existing logic. - Add unit tests for
gmail-oauth.ts(mockgoogleapis, verify MIME construction matches existing format byte-for-byte for OTP, certificate w/ attachment, and trainer invitation cases). - No call site changes yet. PR can be merged here without affecting any user flow.
If we wanted to be ultra-cautious, Phase 1 could even be its own PR. I’d argue it’s safe to combine with Phase 2 since the new code is unreachable.
Phase 2 — Migrate the two test endpoints first
pages/api/training-provider/send-test-email.ts→sendEmail()pages/api/training-provider/send-test-certificate-email.ts→sendEmail()- These are admin-triggered, manual, low-volume. Any regression is caught immediately by the operator clicking “Send Test Email” and not receiving it.
- Verification gate: before proceeding, manually send a test from Tertiary’s production via the Company Settings UI and confirm the email arrives, looks identical, has correct sender/reply-to, and (for the certificate test) attaches a PDF.
Phase 3 — Migrate user-facing flows in low-risk order
Order chosen by blast radius if regressed:
- Feedback form (
send-feedback.ts) — low volume, only TP staff see it. - Trainer invitation follow-up (
respond.tssendFollowUpEmail) — low volume, off the critical path. - Trainer invitation main send (refactor
trainerInvitationSender.ts::sendGmail()to delegate tosendEmail()) — automated but already wrapped, so the change is one file. - Course confirmation, completion, courseware emails (3 cron jobs) — automated, run nightly. Worst case: one batch goes silent. Acceptable rollback window.
- Proforma invoice email (
send-proforma-email.ts) — finance-triggered, batch. - Certificate emails (3 paths) — admin-triggered + cron. Critical for compliance, so done after the cron-based ones above prove the pattern.
- Forgot password (
forgot-password.ts) — user-triggered, but users have alternative paths (ask admin). - OTP login (
send-otp.ts) — last. Highest blast radius (broken OTP = total login outage). By the time we touch this, the abstraction has been shaken out by 14 other call sites.
Phase 4 — Migrate the SMTP-only path
pages/api/tickets/create.ts currently uses lib/services/emailService.ts (SMTP). Two options:
- 4a (preferred): route ticket emails through
sendEmail()too. Today they’d go via Gmail OAuth (changing the support-ticket transport from SMTP to Gmail OAuth). Verify deliverability matches before merging. - 4b (alternative): leave
emailService.tsalone in PR 1 — it’ll naturally migrate when PR 2 introduces SMTP as a real provider. Lower risk for PR 1, slight tech debt deferred.
Recommendation: Phase 4b. PR 1 stays focused on Gmail OAuth consolidation. The single SMTP path becomes the first opt-in user of the SMTP provider in PR 2.
6. Verification strategy
| Layer | Approach |
|---|---|
| Unit tests (new) | lib/email/providers/gmail-oauth.test.ts — mock googleapis, assert MIME byte-equality with snapshots captured from current production code for OTP, certificate (with PDF attachment), trainer-invitation (with cc list), feedback (with reply-to). |
| Integration test (new) | Single test that hits sendEmail() against a Gmail sandbox account with MAIL_TEST_MODE=true, verifies a real message lands. Run manually before merge, not in CI (needs network + creds). |
| Manual smoke test | Walkthrough script: send OTP, send forgot-password, send certificate via UI test button, trigger one cron job manually. Capture sender, reply-to, attachment, and rendered HTML for each. Compare against pre-refactor screenshots. |
| Production canary | Deploy to Tertiary first (it’s our own client; we eat the dogfood). Monitor auto_create_certificates_log and the OTP table for 48h before declaring stable. |
| Rollback plan | The PR is one git revert away from restoring the inline calls. We tag the pre-PR commit so revert is mechanical. |
7. Risk register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| MIME format drift breaks Gmail’s parsing of attachments (PDF certificates) | Low | High (no certs sent) | Snapshot tests on MIME bytes; manual cert send via test endpoint before user-facing certs migrated |
| Subtle change in From / Reply-To header behaviour | Medium | Medium (replies go to wrong place) | Phase 3.1 on feedback form catches this early; explicit assertion in unit tests |
| OAuth token refresh logic regresses | Low | Critical (everything dies after 1h) | Existing loadTrainingProviderEmailConfig already handles refresh; we keep its behaviour, just call from one place |
| Cron job silently fails | Medium | Medium (delayed by 1 day) | All cron jobs have logging tables (auto_create_certificates_log etc.); add monitoring step in Phase 3 |
| Unicode / emoji in subjects double-encodes | Low | Low (cosmetic) | Snapshot tests include emoji case |
| Deployment partial-rollout (one container new, one old) | Low | Low | Coolify deploys atomically; not multi-replica |
8. Effort & timeline
| Phase | Work | Calendar |
|---|---|---|
| Phase 1 (abstraction + tests) | ~4 hours | Day 1 |
| Phase 2 (test endpoints + manual verify) | ~1 hour | Day 1 |
| Phase 3 (8 user-facing migrations + verify each) | ~4 hours | Day 2 |
| Phase 4b decision (skip in PR 1) | 0 | — |
| Manual smoke + canary on Tertiary | 48h soak | Days 3–4 |
| Total wall-clock | ~4 days |
Pure coding time is ~1 dev-day; the rest is verification time we should not skip.
9. Acceptance criteria for PR 1 to merge
- All 15 Gmail OAuth send sites call
sendEmail()and contain zero references togoogleapis/gmail.users.messages.senddirectly. lib/services/emailService.ts(SMTP) and its single consumerpages/api/tickets/create.tsare unchanged (Phase 4b).- No new env vars, no new DB columns, no schema migrations.
- Unit tests pass; manual smoke test signed off.
- 48h Tertiary canary shows no increase in failed sends, no user-reported OTP issues.
- Code-review approval focusing specifically on the OTP and certificate-attachment paths.
10. Open questions for review
- Phase 4 choice — 4a or 4b? I’m proposing 4b. Are we comfortable leaving the SMTP path alone for PR 1?
- Test infrastructure. Do we have Jest/Vitest set up already? If not, do we add it as part of PR 1, or skip unit tests in favour of the manual smoke test?
- Canary on Tertiary. Comfortable using Tertiary as the canary, or want to wait for Chariot to be the test client?
- Who owns the manual smoke test? Roughly 30 min of clicking through the admin UI and triggering each send.
11. What PR 2 looks like (preview, not part of this PR)
For context only — confirms the PR 1 abstraction is the right shape:
- Migration adds
email_provider(default'gmail_oauth') andsmtp_*columns totraining_provider. - New
lib/email/providers/smtp.tsusing existingnodemailerdep. resolver.tsswitches onemail_provider.- Company Settings UI gets an “Email Provider” dropdown; SMTP fields appear when SMTP is selected.
pages/api/tickets/create.tsmigrated tosendEmail()(closing Phase 4b).- Existing clients keep
gmail_oauthdefault → zero migration burden.
If PR 1’s abstraction is right, PR 2 is a one-day change. If PR 2 turns out to need design changes to the provider interface, we revise PR 1 before merging it — which is why we’re getting alignment on this doc first.