Open Lab, Workpaper Organization & Automation

⚡ For builders. The #1 most-requested unmet need in the Lab: getting the mountain of source documents, PDFs from TaxCaddy, bank statements, K-1s, brokerage 1099s, extracted, organized, and into clean workpapers without hand-keying. Here's how to build it on the code-computes, AI-drafts, you-decide architecture, with the PII handling that makes it safe.

🧪 Try it yourself: the Bookkeeping Pipeline lab gives you 12 months of (fake) bank statements and the skills to take them to a trial balance and bank rec, so you can run this pipeline, not just read about it.

Why this is hard (and why naïve AI fails at it)

Source docs are messy, high-volume, loaded with PII, and the output is numbers that feed a return. So the two failure modes from the core guides both bite at once: silent extraction errors (a transposed digit looks perfect) and client-data exposure (statements are wall-to-wall PII). A single "summarize these PDFs" prompt fails on both. The fix is an architecture, not a prompt.

The pipeline

1. INTAKE     collect source docs (TaxCaddy export, scans, statements)
2. REDACT     strip PII locally BEFORE any external LLM  ← non-negotiable
3. EXTRACT    (script + OCR) pull fields into structured data; keep row counts
4. CLASSIFY   (LLM) label each doc/line: which client, year, category, schedule
5. ORGANIZE   (script) file by your naming convention; build the index
6. VERIFY     foot every total back to the source document
7. REVIEW     a person signs off before it's a workpaper of record

Steps 3, 5, 6 are deterministic (they have one right answer). Step 4 is the only place the LLM earns its keep, deciding what a document is and where it belongs. That split is what keeps the numbers trustworthy.

Step-by-step, the way the Lab's builders do it

2 · Redact first. Workpapers can't be anonymized away, you need the real numbers, so this is the "real data in a controlled place" path, not the "scrub and use any tool" path. Run a local redactor (the Redactor tool, or a local PDF redactor that auto-masks SSNs and routing numbers) and process real client data only in a firm-approved tool inside your WISP. Caveat the room insists on: no scrubber is perfect, screenshots strip metadata but not visible PII, and a missed field is a disclosure. Validate.

3 · Extract with scripts, not vibes. Use OCR/parsing to pull fields into a table. Demand a row count and a column total out of every document so you can foot it. (Vision models can read a statement, but treat the output as unverified until step 6.)

4 · Let the LLM classify (it proposes; you confirm). This is where the LLM earns its keep, proposing which client, which tax year, which category/schedule, is this a duplicate, is anything obviously missing (e.g., a 1099 referenced but not attached). Have it flag ambiguities [REVIEW] instead of guessing. The classification is a proposal you confirm, not a decision it makes.

5 · Organize deterministically. A script applies your naming convention (e.g., Client_Year_Category_DocType) and builds the workpaper index. Consistency here is the whole value, don't let the LLM freelance file names.

6 · Verify, foot it to the source. Every extracted total ties to the document's stated total, row counts match, nothing dropped or duplicated. Unverified numbers do not become workpapers. (This is the Module 5 discipline, applied at scale.)

7 · Human review and sign-off. The professional owns the workpaper. If you write anything back to a document-management or ledger system, use the staging-table pattern, propose, review, then execute.

Completeness, the check that's unique to workpapers

Footing one document (step 6) proves that document was read correctly. Workpapers need a second, higher check: is the file complete, and does it cross-tie? This is where AI's "looks done" is most dangerous, a missing K-1 leaves no error, just an understated return.

Expected-vs-received. Build (or have the LLM draft from the prior year) a checklist of documents you expect for this client, and reconcile against what arrived. Flag every gap: "2 of 3 brokerage 1099s received," "prior-year Schedule E rental not seen this year, confirm."
Cross-document tie-outs. The sum of the 1099-INT/DIV should match the interest/dividend lines; K-1 figures should match the entries that use them. Mismatches are exactly what a reviewer needs surfaced.
Carryforwards. Prior-year items that must roll (NOLs, basis, depreciation, suspended losses), flag whether they're present and consistent.

Have the LLM flag these gaps; you decide what each means. (Code computes the tie-outs; AI drafts the exceptions list; you make the calls, same three-layer split as Skills & Architecture.)

What to build first (a concrete spec)

Don't build "automate all workpapers." Build one recurring, high-volume, rule-bound document type end-to-end. A good first target, bank statement → categorized workpaper:

In: a PDF/CSV statement (redacted: account number masked, name tokenized).
Out: a table, date · description · amount · category · [REVIEW?], plus the row count and the column total.
Checks that must pass before it's a workpaper: column total = statement ending activity; row count = statement line count; every [REVIEW] resolved by a human; category mapping matches your chart.
Then add document types one at a time (1099s, K-1s), reusing the same redact→extract→classify→ foot→complete→review loop. A narrow pipeline you trust beats a broad one you don't.

Guardrails

PII is unavoidable here, so the controlled-tool + WISP + §7216 path is mandatory, not optional (Regulatory Foundation). Redact locally first regardless.
No silent posting. Extracted numbers are drafts until footed and reviewed; nothing posts on its own.
Reviewer of record. A workpaper carries professional weight, the human signs.
Assume the scrubber missed something until you've validated; for attest work, AU-C evidence and documentation duties still apply.

Open Lab track · built on Skills & architecture; shares the verification discipline of Module 5.