The system is small, but it still has a few distinct moving parts.

1. Microsoft Graph ingestion

Microsoft Graph is the source of truth for mailbox data.

Each run fetches emails from a safe time window rather than relying on a vague “latest N messages” approach. That matters because scheduled jobs can miss edge cases around run boundaries if retrieval windows are not handled carefully.

2. Processed-email ledger

Every processed email is stored in Airtable with its message ID, core metadata, and classification result.

This serves two purposes:

That record became important once retry, rollback, and later downstream job actions were introduced.

3. Relevance and status classification

The classification pipeline is deliberately split into two stages.

First, the system decides whether an email is actually relevant to a real application workflow.

Then, only if it is relevant, it classifies the email into one of the operational states used by the system:

That two-step structure keeps irrelevant noise away from the more specific downstream logic.

4. Job record resolution