Open Lab, Local AI Models: The Strongest Safe Lane for Client Data

⚡ For practitioners who want maximum privacy. A model running entirely on your own hardware is the one setup where taxpayer data never leaves your walls. That changes the compliance math: there is no third-party disclosure, so the §7216 consent regime is not triggered for the AI step itself. This module covers why that is true, exactly where it stops being true, and how to actually run one on a typical firm Mac.

This is the privacy-maximal option we pointed to in the security essay's "by tool type" chart. It is also the most misunderstood, because "local" only buys you the §7216 benefit if the setup is genuinely local. One cloud component anywhere in the chain reintroduces every obligation you thought you had escaped. Read the guardrail section before you trust this with a real client file.

The legal "why" lives in Regulatory Foundation. This module applies that foundation to local models. Where it states a Code or regulation cite, that cite is the one verified there. Where it states a hardware spec, model size, or speed, treat it as current as of June 2026, verify against the source before relying on it (model libraries and chips move fast).

Why a local model changes the compliance analysis

§7216 regulates a preparer's disclosure and use of tax return information. The operative definition is the whole game:

Treas. Reg. §301.7216-1(b)(5) defines disclosure as "the act of making tax return information known to any person in any manner whatever."

A cloud AI workflow creates a disclosure question because your prompt, the uploaded document, the extracted text, the metadata, or the embeddings get transmitted to a third-party platform that is a separate person from you. That is the moment of disclosure (it is also the point the security essay makes: the disclosure happens when your firm transmits the data to the vendor).

A model running only on your own device is not a person. This is not just common sense, it is the statute: §7216 bars making TRI known to "any person," and "person" is defined in IRC §7701(a)(1) to mean "an individual, a trust, estate, partnership, association, company or corporation." Software is not on that list. The weights are software on your machine, the data stays on that machine, and no outside person is made aware of the taxpayer information through the AI interaction. So:

Obligation	Cloud AI on client data	Genuinely local AI
§7216 disclosure consent (Rev. Proc. 2013-14 form)	Needed unless a §301.7216-2 exception applies	Not triggered for the inference step,† no third-party recipient
AICPA Confidentiality Rule (ET §1.700.001), third-party provider	Consent or a confidentiality agreement with the vendor	No outside provider receives the data, third-party concern eliminated
FTC Safeguards / WISP vendor management	Vendor due diligence, DPA, oversight	No AI vendor to vet for the local tool itself
Circular 230 vendor supervision	Assess, contract, monitor the vendor	No external AI vendor to supervise

† This is a strong reading of the plain text, but it is not settled authority for AI specifically. The argument is textual and well-grounded: "disclosure" is making TRI known to a person (§301.7216-1(b)(5)), "person" is statutorily limited to humans and legal entities, not software (§7701(a)(1)), and processing data on equipment you control has never been a third-party disclosure (the same reason QuickBooks Desktop or Excel on client data isn't). But as of June 2026 there is no IRS guidance, ruling, or case addressing AI under §7216, the most recent §7216 guidance (Rev. Ruls. 2010-4 and 2010-5, plus the 2012 final regs) predates the technology. The gap cuts both ways: the IRS hasn't confirmed the local path and hasn't called local inference a disclosure. And "genuinely local" is doing heavy lifting (see the cloud-component section). So: use this reasoning internally with confidence, but before you rely on "no consent needed because it's local" in client-facing language or an engagement letter, run it past counsel and your malpractice carrier. This answers the disclosure question only; §7216 also governs use (see "What does NOT go away"), so fine-tuning a local model on client data is a separate, unguided question. The conservative posture, consent where you'd otherwise get it, de-identification, and reviewer-of-record, does not depend on this conclusion being right, which is the safer place to stand. Grounded in Regulatory Foundation.

That is a real, defensible reduction in obligations, and it is the reason local models are worth the setup effort for sensitive client material.

What does NOT go away

Local privacy does not touch substantive professional responsibility. All of this still applies, exactly as it would if you had done the work by hand:

Circular 230 due diligence as to accuracy (§10.22). You are responsible for the correctness of the return and every representation, whether or not AI helped.
Circular 230 competence (§10.35). You need the knowledge to judge the output.
SSTS No. 1, member responsibility. You determine whether a position meets reporting and disclosure standards. AI drafting does not move that line.
§7216 "use." §7216 regulates use, not just disclosure. Keep local-AI uses tied to return preparation, tax advice, accounting services, or another permitted engagement purpose. Do not repurpose one client's information for unrelated business uses.
The full WISP, on the endpoint. FileVault full-disk encryption, a strong login password, auto-lock, least-privilege file access, encrypted backups, patch management, and a written AI-use policy. The data is now concentrated on your device, so the endpoint controls matter more, not less. See the WISP AI addendum.

The one-line version: a local model can be perfectly private and still be wrong, stale, or hallucinated. Privacy is not accuracy. The verify-every-citation rule from Module 2 applies in full.

The line that ends the benefit: one cloud component reintroduces everything

This is the single most important practical point in the module. The §7216 conclusion depends on the setup being genuinely local. The moment any of these touch the client data, you may be back to a third-party disclosure and full vendor management:

A cloud OCR step (uploading a scanned W-2 to a hosted OCR API to get text out)
A remote embedding API or a hosted vector database for "chat with your documents"
A sync folder (the file lands in a cloud-synced directory on its way through the workflow)
A browser extension that sends page or document contents to a vendor
Telemetry or a cloud fallback in the AI app itself
A vendor support session with access to client files

Any one of these can re-trigger §7216, the AICPA Confidentiality Rule, and FTC Safeguards vendor management for that piece of the stack. Local-AI privacy is only as good as the most cloud-connected step in the chain. When you design a workflow, trace the data end to end and confirm nothing leaves the device. If something must (for example, you need commercial-grade OCR on bad scans), treat that step under the normal cloud rules: anonymize, or get the §7216 basis, or use an approved tool under a DPA.

What local models are good at (and what they are not)

Best-fit tasks (language generation with your judgment on top):

Client letter and email drafts, IRS notice response drafts (CP2000 and the like)
Summarizing client-provided documents and long instruments
Engagement-letter issue spotting
Plain-English explanations, meeting prep, internal checklists and SOPs
First-pass research outlines (not the authority itself)
Excel formulas, Python scripts, CSV cleanup, reconciliation helpers

Poor-fit tasks (do not rely on a local model for these):

An authoritative tax research database. It has a fixed training cutoff and no live access to current Code, regs, rulings, or IRS guidance. It will confidently apply law that changed after its cutoff and invent citations that look real.
Real-time operational facts: IRS processing times, transcript or CAF status, refund tracking, e-Services availability. It cannot know these.
Unsupervised source-document extraction. Vision and OCR quality is uneven; always compare output back to the original.
Final review of a complex position, or any substitute for professional judgment.

The honest framing: local AI drafts, summarizes, extracts, and proposes. You verify, decide, sign, file, and advise. That maps directly onto SSTS No. 1 and Circular 230 §10.22.

Hardware reality (as of June 2026, verify current specs)

Apple Silicon is unusually good for this because CPU, GPU, and Neural Engine share unified memory, so a Mac can run larger models than a discrete GPU with the same nominal VRAM. The main bottleneck for speed is memory bandwidth, not raw CPU.

Pick a model tier by how much unified memory you have. Sizes below are approximate Q4 (4-bit-quantized) download sizes; speeds are planning ranges, not guarantees:

Model tier	Example models	RAM to run comfortably	16GB Mac	24GB Mac (M4 Pro class)	48GB+ Mac
3B	Llama 3.2 3B, Phi-4-mini	8GB	70–100 t/s	100+ t/s	180+ t/s
7B / 8B	Qwen2.5 7B, Mistral 7B	16GB	33–50 t/s	30–65 t/s	100+ t/s
13B / 14B	Phi-4 14B, Qwen2.5 14B	24–32GB	12–25 t/s (tight)	18–40 t/s	70–100 t/s
30B / 32B	Qwen2.5 32B	48–64GB	No	Borderline	45–80 t/s
70B	Llama 3.3 70B, Qwen2.5 72B	64–96GB+	No	No	30–55 t/s

Reading the table:

16GB machine: Qwen2.5 7B is the workhorse, fast and capable. Phi-4-mini is excellent for code and Excel formulas. Good enough for drafting, summarizing, and structured extraction with strict verification. Not a tax-law authority.
24GB machine: opens the comfortable 12B–14B tier (Qwen2.5 14B, Phi-4 14B, Mistral NeMo 12B), noticeably better writing and reasoning.
48GB machine: opens the 30B–32B tier (Qwen2.5 32B, Mistral Small 24B, Command R 35B), the strong local professional tier.
64GB+ machine: runs 70B models (Llama 3.3 70B, Qwen2.5 72B) at usable speed, the best local quality available.

Buying guidance: if you want a dedicated local-AI workstation, 24GB unified memory is the practical minimum, 48–64GB is the strong recommendation, and 96GB+ is for running 70B models comfortably. A Mac mini M4 Pro at 48–64GB is the value sweet spot for a solo firm; an Ultra-class Mac Studio is justified only for frequent 70B inference, large local document libraries, or heavy vision/OCR work.

Quantization, in one line: "Q4" stores the weights at 4-bit instead of 16-bit. It roughly quarters the memory and speeds things up, at a small quality cost. Q4_K_M is the sensible default; Q5_K_M and Q8_0 trade more memory for slightly better quality.

How to actually run one

Ollama (the simplest start)

Ollama wraps model download, storage, and inference behind a few commands and a local API. Install it, then:

# Download a model (about 4.7GB)
ollama pull qwen2.5:7b

# Interactive chat
ollama run qwen2.5:7b

# One-shot prompt
ollama run qwen2.5:7b "Draft a concise client email explaining why we need basis schedules."

# Manage models
ollama list
ollama rm qwen2.5:7b

It also serves a local REST API at http://localhost:11434, which is what other tools talk to. The model library at ollama.com/library lists current sizes, context windows, and which models support images.

LM Studio is the GUI alternative: search, download, and chat with a graphical interface, plus an OpenAI-compatible local server (/v1/chat/completions) for tools that expect that endpoint. Use Ollama for reliability and simple operation; use LM Studio if you want a GUI and easy model search.

Lock in the behavior you want with a Modelfile

An Ollama Modelfile lets you bake a system prompt and settings into a custom model, so every session starts with your guardrails already in place. This is where you encode "don't invent citations" so you don't have to retype it:

FROM qwen2.5:14b

PARAMETER temperature 0.2
PARAMETER num_ctx 32768

SYSTEM """
You are a local AI drafting and research assistant for a CPA tax practice.
You run locally and must not suggest uploading taxpayer data to cloud AI tools.
You are not the authoritative source of tax law.
For every tax question, separate: facts, assumptions, issues, authorities to verify,
analysis, and open questions.
Do not invent IRC sections, regulation citations, revenue rulings, cases, forms, or
notice deadlines. If an authority is uncertain, label it UNVERIFIED.
Flag any issue that may depend on law or guidance after your training cutoff.
For client communications, write clearly and include a CPA-review note when appropriate.
For extraction tasks, never infer missing amounts; mark uncertain OCR as REVIEW ORIGINAL.
"""

ollama create firm-tax-assistant -f ./Modelfile
ollama run firm-tax-assistant

The OCR problem (and the private workaround)

Most text-only local models cannot read a scanned W-2, a photo of a 1099, or a scanned K-1. You have to get the text out first, and the way you do that decides whether you keep your privacy benefit:

Keep it local: use Apple's Vision framework (on-device text recognition built into macOS, the best Mac-native private option) or Tesseract (free, local, scriptable, sensitive to scan quality so it needs clean images). Then feed the extracted text to the local model. This "OCR-first, model-second" flow stays entirely on your device.
A local vision model like Qwen2.5-VL can read images directly and is the best local multimodal option, but test it on your actual W-2s and K-1s before trusting it, and use it for review and issue-spotting, not as the sole source of extracted numbers.
Commercial cloud OCR is often best on messy forms, but uploading the scan is a disclosure. That step falls back under the normal cloud rules (anonymize, consent, or an approved tool under a DPA). It does not get the local benefit.

Copy-paste prompts (use locally, verify before relying)

Tax research outline (a checklist to verify, never the authority itself):

You are a tax research drafting assistant for a CPA. I have a client who [facts].
Walk me through the potentially applicable IRC sections, Treasury regulations, IRS
guidance, and cases. Separate: (1) facts that matter, (2) legal issues, (3) authorities
to VERIFY, (4) likely analysis, (5) missing facts, (6) current-law checks.
Do not invent citations. If uncertain, mark the authority UNVERIFIED.
Flag any issue that may depend on law after your training cutoff.
I will verify all authority independently before relying on this.

K-1 extraction from pasted OCR text (then reconcile to the original):

Extract this Schedule K-1 data into a table. Fields: entity name, EIN, partner name,
tax year, ordinary business income, rental income, interest, dividends, capital gains,
Section 179, charitable contributions, credits, distributions, ending capital.
If a field is not present, write MISSING. If OCR looks uncertain, write REVIEW ORIGINAL.
Do not infer amounts that are not visible.
[Paste OCR text]

Client memo from verified notes (safer than asking the model to generate the law):

Turn these verified research notes into a client memo for a small business owner.
Question: [issue]. Use only the authorities and facts below. Do not add new citations.
Structure: Summary, Facts, Options, Risks, Recommendation, Action Items.
Authorities (verified by CPA): [paste verified excerpts]
Facts: [paste facts]

Notice the pattern: the model organizes and drafts; the authority comes from you, already verified. That is the safe shape for everything that touches tax law, local or not.

Guardrails (the local-models version)

"Local" only counts if it is genuinely local. No cloud OCR, no remote embeddings, no hosted vector DB, no sync folder, no telemetry. Trace the data end to end. One cloud step reintroduces §7216, AICPA confidentiality, and FTC vendor management.
Private is not accurate. Verify every Code section, reg, ruling, and threshold against primary source before it reaches a return, a memo, or a client. Local models hallucinate authority just like cloud ones.
Mind training cutoff. A local model is frozen at its training date. Ask it to flag anything possibly post-cutoff, and check current law yourself.
The endpoint is now the crown jewels. Client data concentrates on your device. FileVault, strong password, auto-lock, encrypted backups, least-privilege access, and the AI-use policy in your WISP are not optional.
Keep uses tied to the engagement. §7216 still governs use. Local processing for return prep, advice, or accounting is fine; repurposing client data for unrelated uses is not.
You are still the reviewer of record. AI drafts; you decide, sign, file, and advise. SSTS No. 1 and Circular 230 §10.22 do not move because the model ran on your laptop.

See Guardrails and the §7216 Decision Framework for the full rule set.

A 15-minute starter

Install Ollama from ollama.com. Confirm with ollama --version.
Pull a model sized to your Mac: ollama pull qwen2.5:7b (16GB) or ollama pull qwen2.5:14b (24GB+).
Run it: ollama run qwen2.5:7b. Ask it to draft a routine client email with no client identifiers the first time, just to feel the speed and quality.
Save the Modelfile above as Modelfile, run ollama create firm-tax-assistant -f ./Modelfile, and use that instead so your guardrails are always on.
Before you ever put a real client file through it: confirm FileVault is on, turn off any cloud-sync on the working folder, and add a line to your WISP describing the local-AI tool, the data it may process, and your output-verification step.

When that is in place, you have the strongest privacy posture available in this whole library: the data never leaves your firm, and the only obligations left are the ones that were always yours anyway, accuracy, competence, and your signature.

Part of the AI Lab for Accountants library. Hardware specs, model sizes, and speeds are current as of June 2026, verify against the source before relying. Legal references are grounded in Regulatory Foundation. Not legal advice.