← Back to the Library
Library · Guide

Step 0, Redact (Mechanics · local, deterministic)

Layer: Mechanics → do this locally, not with cloud AI. A tool that removes PII must touch the raw PII; that can't happen in the cloud without defeating the purpose.

Goal: strip the client's identity from each statement before any of it goes to an AI tool.

What to redact vs. keep, bookkeeping is different from tax research. You can't tokenize everything, because the vendor descriptions are what drive the categorization in Step 2.

Redact (client identity) Keep (drives the coding)
Account number, address Merchant / expense vendor names (Home Depot, Gusto, Ally, Hiscox…)
The business name (the account holder) Transaction dates and amounts
The client's customer names (revenue depositors) "RENT," "PAYROLL," "OWNER DISTRIBUTION" labels

A public merchant like "Home Depot" isn't the client's confidential information, and you need it to code the transaction. The client's name and its customer list are what you tokenize.

How: see how it works with the Redactor demo (fake sample data), then for anything real grab the offline copy (download it, run it with WiFi off — nothing leaves your machine). Either way it tokenizes names/account #s and gives you a local key; then eyeball it, keep the merchant descriptions, confirm the holder/customer names became tokens. In this lab the data is synthetic, so it's safe either way, but practice the habit.

⚠️ Redacting the name doesn't make it anonymous. The amounts, dates, and patterns are still confidential client financial data. You can run this through AI, it just has to be a firm-approved tool (no-training terms, inside your WISP / data agreement), not a personal or free account. The governing duty for bookkeeping data is confidentiality (AICPA Code + state board) + the FTC Safeguards Rule; §7216 attaches specifically when this feeds a tax return. Redaction is defense-in-depth, the tool's data terms are what actually permit the use.

Real-world note: for a production pipeline you'd script this (a local PDF redactor that masks account/routing numbers) so it runs on every file automatically. Redaction is mechanics, give it to deterministic code, never to the model.

You move on with: the redacted statement text (tokens where the identifiers were).

The AI Lab for Accountants · An educational resource, not legal or tax advice.