Although the service is small, it still has to behave like a real backend system when external dependencies misbehave.
That mattered here because the workflow sits on top of several systems that can each fail in different ways:
This section is about what the service actually does once those failures happen at runtime.
The service keeps a last_successful_run_at value and uses it to compute the next retrieval window with overlap.
That overlap reduces the chance of missing emails that arrive near run boundaries, while the processed-email ledger prevents duplicated work from becoming a problem.
If processing fails after a new processed-email record has been created but before the attempt is complete, that record is deleted before the failure bubbles up.
If processing fails on an already-existing processed-email record, the service does not delete it.
The runtime rule is simple: only clean up state created by the failed attempt itself.
Retries are limited to failures that might plausibly succeed on another attempt.
Examples include: