Core Components

The system is small, but it still has a few distinct moving parts.

1. Microsoft Graph ingestion

Microsoft Graph is the source of truth for mailbox data.

Each run fetches emails from a safe time window rather than relying on a vague “latest N messages” approach. That matters because scheduled jobs can miss edge cases around run boundaries if retrieval windows are not handled carefully.

2. Processed-email ledger

Every processed email is stored in Airtable with its message ID, core metadata, and classification result.

This serves two purposes:

deduplication across overlapping run windows
a durable ledger of what the system believed each email represented

That record became important once retry, rollback, and later downstream job actions were introduced.

3. Relevance and status classification

The classification pipeline is deliberately split into two stages.

First, the system decides whether an email is actually relevant to a real application workflow.

Then, only if it is relevant, it classifies the email into one of the operational states used by the system:

generic update
assessment
interview invitation
rejection

That two-step structure keeps irrelevant noise away from the more specific downstream logic.

1. Microsoft Graph ingestion

2. Processed-email ledger

3. Relevance and status classification

4. Job record resolution