PythonPostgreSQLpandasJenkinsAnsible

Accelerating SOX compliance reporting from 3+ hours to under 10 minutes

How I replaced a quarterly spreadsheet-wrangling exercise with an automated reporting pipeline that auditors actually liked — and why the hardest part wasn’t the code.

A sanitized write-up of a real production reporting pipeline. Internal control identifiers, auditor-specific formats, and confidential report structure details have been generalized. The process, trade-offs, and audit-ability lessons are real.

TL;DR

SOX compliance reports on our database platform used to take an engineer 3+ hours per quarter to produce: pulling data from five different systems, reconciling discrepancies by hand, formatting the output for the audit team, and fielding follow-up questions. I built an automated pipeline that produced the same report, reconciled against the same sources, and formatted it identically — in under 10 minutes, end to end. More importantly, the new pipeline left an audit trail that made the report itself easier for auditors to verify. The headline number is “3+ hours to under 10 minutes”, which is an 18× speedup. The real wins were elsewhere.

Context: what the report was

Our team ran database clusters for 200+ internal enterprise applications. Some of those applications were SOX-in-scope, meaning changes to them had to be audited under Sarbanes-Oxley controls. The audit team asked our team for a quarterly report that answered questions like:

Who had privileged access to each in-scope database this quarter, and when was it granted/revoked?
Were there any production data changes made outside the standard change-management process?
Were backups taken as scheduled, and were any backup jobs failing?
Were there any schema changes in-scope databases during the quarter, and were they all tied to approved change requests?
Did any account have more than one role that would violate segregation of duties?

Each question required pulling data from a different system: IAM logs, ticketing system API, backup tooling, database audit logs, role catalog. The engineer doing the report would log into each source manually, export to CSV, dump into a spreadsheet, reconcile by hand, format it into the auditor’s template, and email it. Three hours on a good day, five on a bad one.

Why it was slow

It wasn’t slow because any individual query was slow. It was slow because:

Context switching between five source systems — each with different auth, query languages, export formats.
Reconciliation was manual — matching IAM grants to ticket approvals by hand, row by row.
The auditor’s expected format was rigid — hand-formatting into a specific template ate half the time.
Every follow-up question restarted the whole cycle — no notes kept, no reproducibility.
Nothing was reproducible — two engineers running the report on the same data could produce slightly different results because they’d make different manual reconciliation choices.

Any of these five would have been a problem on its own. Together, they meant the whole process was slow, error-prone, and unpleasant — which meant it usually got assigned to the most junior engineer available, who had the least context for the judgment calls.

The thing I almost built first (and didn’t)

My first instinct was to write a big Python script that hit all five APIs, joined the data with pandas, and spat out a spreadsheet. In an afternoon I could have had something that produced the report in 10 minutes instead of 3 hours.

I didn’t ship that version. Here’s why.

A compliance report isn’t just a document. It’s a claim the team is making to auditors about the state of the system. The auditor’s job is to verify that claim. If the tool that produced the report is a pile of Python that nobody can easily inspect, the auditor has to verify the data all over again — which is what they were already doing, just with the report as a starting point. An automated report that the auditor can’t verify is not faster. It’s just faster for us.

So the first version I actually built was not the script. It was a design document that answered, for each row in the report, these three questions:

What source system did this row come from?
What was the exact query that produced it?
At what moment was that query run?

And I added a requirement: the report’s output had to make all three of those answers visible.

The pipeline

  [1. Snapshot phase]  For each source, run the query, save raw output
                       to S3 with a timestamp, record the query + source
                       + hash in an "evidence" table in PostgreSQL.

  [2. Load phase]      Load each snapshot into a staging table in
                       PostgreSQL. One staging table per source.

  [3. Reconcile phase] Run SQL join queries across staging tables to
                       answer each audit question. Results go into a
                       "findings" table.

  [4. Render phase]    Jinja2 template turns findings into the auditor's
                       expected XLSX format. Every row links back to its
                       evidence ID.

  [5. Audit log phase] Emit a structured log bundle containing every
                       query, timestamp, hash, row count, and triggering
                       user.

The whole thing runs in a single Jenkins job, takes arguments for “which quarter” and “which clusters in scope”, and writes the final artifacts to a dated folder in S3.

Why PostgreSQL in the middle

I considered doing the whole thing in pandas DataFrames in memory. I chose PostgreSQL staging tables instead because:

The reconciliation queries became SQL — a language both engineering and the auditors’ consultants could read.
I got a free audit trail — every query could be re-run later against the preserved staging data, exactly as it ran the first time.
Follow-up questions became cheap — when an auditor asked “show me all the privileged access changes for cluster X in the last 30 days”, I could write and run a new query against the same staging data in minutes.
Debugging was much easier — SELECT * FROM staging_access_changes WHERE ... beats df[df['...'] == ...] when you’re explaining to someone who doesn’t write Python.

The evidence table

The thing I’m proudest of in this project. One row per “query I ran during the pipeline”:

CREATE TABLE sox_evidence (
    id UUID PRIMARY KEY,
    run_id UUID NOT NULL REFERENCES sox_runs(id),
    source_system TEXT NOT NULL,
    query_name TEXT NOT NULL,
    query_text TEXT NOT NULL,
    query_params JSONB NOT NULL,
    executed_at TIMESTAMPTZ NOT NULL,
    executed_by TEXT NOT NULL,
    result_row_count INTEGER NOT NULL,
    result_hash TEXT NOT NULL,
    raw_output_s3_key TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Every row in the final report has a column called evidence_id that points to the exact evidence row that produced it. An auditor who wants to verify any claim in the report can:

Look up the evidence row
See the exact query text and parameters
See the exact timestamp
See the row count and hash of the raw output
Pull the raw output from S3 and re-run the query to verify it’s byte-identical

This turned what used to be a “trust us, here’s a spreadsheet” deliverable into “here’s the spreadsheet, here’s the SQL that produced every row, and here’s the raw source data.” The first quarter after I shipped this, the auditor follow-up question volume dropped from roughly 15 per report to zero.

The three non-obvious lessons

Lesson 1: Freeze-drying source data is a feature, not a side effect

The snapshot phase felt like overkill while I was building it. Three things made it the most important part of the pipeline in retrospect:

Source systems change. The ticketing system’s API returned different fields six months later. Without snapshots, I couldn’t have re-run old reports consistently.
Auditors ask retroactive questions. “What did the access list look like on September 30th?” becomes trivial if you snapshotted it on September 30th.
Incidents happen between snapshot and render. A source system can go down mid-run. If you snapshot first and process from the snapshot, a source outage after step 1 doesn’t break the run.

Snapshot first, process second. The cost is a few hundred megabytes in S3. The value is months of retroactive auditability.

Lesson 2: The auditor is a user

I used to think of the audit team as a constraint. After this project I started thinking of them as a user group:

They need to verify your claims. The easier you make that, the less time they spend asking questions.
They care about reproducibility and traceability more than speed. A slow report they can verify is better than a fast one they can’t.
They will ask the same questions every quarter. Pre-answering them in the report structure saves everyone time.

Once I started treating auditors as users, design decisions got much easier. “What would make this easier to verify?” is a better question than “what does the template require?”

Lesson 3: Automation that creates audit trails is non-negotiable in compliance work

In regular backend work, “we could rebuild this from the source of truth” is a perfectly good answer to observability gaps. In compliance work, it’s not. You need the exact historical state, on demand, with a verifiable chain from the data to the claim.

This shaped every design decision. Every query writes an evidence row. Every report row references an evidence row. Every source pull is snapshotted. None of it is optional.

The upside: the engineering team started using the sox_evidence table for internal debugging too, because it was easier to query than the upstream source systems.

The numbers

Metric	Before	After
Engineer time to produce report	3–5 hours	~10 minutes
Auditor follow-up questions	~15 per report	~0 per report
Report variance between engineers	Non-trivial	Zero
Retroactive “show me X on date Y” questions	Multi-hour archaeology	~2 minute SQL query
Time-to-first-sent after quarter close	~3 business days	~2 hours

What I’d do differently

Build the snapshot phase first. It feels boring until it saves you.
Involve the audit team in the design phase. I shipped v1 and iterated. I should have pulled auditors in before writing code.
Write a “compare two runs” tool alongside the pipeline. Eventually I wanted to diff quarters. If the data model is good, a diff tool is ~200 lines.
Treat every query as a named asset in version control. Named, versioned queries in source control are easier to review, diff, and trace.

What transfers

Anywhere you have:

Periodic reports produced by hand from multiple sources
An audit or verification requirement downstream
A team that treats the report as write-only output

…the pattern applies. Snapshot → stage → reconcile → render → log, with an evidence table tying every output row back to its source.

The bigger idea: in compliance or audit work, the goal isn’t “produce the report faster.” It’s “produce the report in a way that is cheaper to verify.” Speed is a side effect of doing the latter well.