Why 'we have a BAA' isn't the same as HIPAA-compliant AI

Key points

A BAA makes your AI provider a business associate — it does not prevent PHI from being transmitted
PHI can still leak through inference logs, embeddings, fine-tuning data, and your own application logs
HIPAA §164.312 technical safeguards are your responsibility, not your vendor’s
Prevention before transmission is the only reliable approach — pseudonymize before the API call

There’s a shorthand circulating in healthcare technology circles that needs to be corrected: the idea that having a Business Associate Agreement with your AI provider means your application is HIPAA-compliant. It doesn’t. Not even close.

The BAA is a legal prerequisite — one of five or six things you need to achieve real compliance. Treating it as sufficient creates a paper-thin compliance posture that can collapse entirely in a breach investigation.

What a BAA actually is (a contract, not a control)

A Business Associate Agreement is a contract. Under 45 CFR § 164.308(b), covered entities (healthcare providers, health plans, healthcare clearinghouses) are required to have a BAA with every business associate that creates, receives, maintains, or transmits PHI on their behalf.

The BAA contractually obliges the business associate to:

Use PHI only for the purposes specified in the contract
Implement appropriate safeguards
Report breaches to the covered entity
Return or destroy PHI at contract termination

What the BAA does not do:

Prevent PHI from being transmitted in the first place
Specify how the business associate’s inference systems actually handle data
Guarantee deletion of PHI from model contexts, logs, or training pipelines
Protect against unauthorized access within the business associate’s systems
Give you technical audit capability into the provider’s infrastructure

The BAA makes OpenAI (or any other AI provider you sign with) your business associate. It does not make the data transfer safe. It does not make the processing adequate. It creates a legal relationship — useful for liability and breach notification, insufficient for technical safeguards.

45 CFR § 164.312 — Technical safeguards required

The HIPAA Security Rule’s technical safeguard requirements are where the rubber meets the road. Section 164.312 requires covered entities to implement:

Access controls (§ 164.312(a)(1)): Unique user identification, emergency access procedure, automatic logoff, encryption/decryption. These must be implemented in your systems, not delegated away through a BAA.

Audit controls (§ 164.312(b)): Hardware, software, and procedural mechanisms that record and examine activity in systems that contain or use ePHI. If your LLM application doesn’t generate audit logs with sufficient granularity to reconstruct what PHI was accessed and by whom, you’re out of compliance — regardless of your BAA.

Integrity controls (§ 164.312(c)): Protection against improper alteration or destruction of ePHI. In an AI context, this means the PHI going into a prompt must arrive unaltered at the LLM, and you need a way to verify this.

Transmission security (§ 164.312(e)(1)): Guard against unauthorized access to ePHI being transmitted over electronic networks. TLS is necessary but not sufficient — it protects the transmission channel, not the content.

A BAA says nothing about any of these. They are your responsibilities as the covered entity (or business associate’s subcontractor) building the application.

Where BAA-covered providers still leak PHI

Here’s the uncomfortable reality: you can have a valid BAA with every provider in your stack and still have PHI leaking in multiple places. Here’s how:

Inference logs: Most LLM providers log requests and responses by default, even with a BAA. The BAA may restrict how long they retain these logs, but the logs exist. Who at the provider has access? Under what circumstances? These questions matter under HIPAA’s minimum necessary standard and your BAA should specify — but most don’t, in detail.

Context windows and in-flight processing: When a prompt containing PHI is processed, it lives in the model’s context window. The BAA covers the provider’s systems, but the minimum necessary principle still requires you to think about whether all that PHI needs to be in the context. A patient’s full medical history in a system prompt is different from the specific clinical question being asked.

Embeddings: If you’re building a RAG system that embeds patient records, the embedding API call transmits the raw text. The resulting vector is opaque, but the API call itself contained PHI. Your BAA covers this. Your data minimization obligation still asks whether you sent more than necessary.

Subprocessors: Does your AI provider use subprocessors? Most do — GPU cloud providers, monitoring services, distributed inference networks. Your BAA with OpenAI doesn’t automatically create a BAA between you and every company in their supply chain. Ask for their subprocessor list and their BAA chain.

Fine-tuning: If you fine-tune on clinical data, you’re providing PHI to the provider for model training. BAAs rarely address fine-tuning data with the specificity this deserves. What happens to that data after the fine-tuning job completes? Is it used for general model improvement? Stored indefinitely?

This is why Privedge operates on a different principle: the goal isn’t a better BAA, it’s ensuring PHI never reaches the provider in the first place.

When your application routes through Privedge, names become [PERSON_1], medical record numbers become [MRN_1], diagnoses become [CONDITION_1]. The LLM provider receives anonymized tokens. Your RAG system embeds documents that contain no personal information. Your fine-tuning data contains no PHI. The BAA matters less when there’s nothing to protect against.

Real breach scenarios

Scenario 1 — Fine-tuning on PHI

A health tech startup fine-tunes GPT on clinical notes to improve their documentation AI. They have a BAA. Six months later, researchers demonstrate that model output can be used to reconstruct training data. The fine-tuning PHI is recoverable. The BAA covered the training process but didn’t prevent the exposure — and the minimum necessary standard wasn’t applied because the entire note was sent.

Scenario 2 — RAG with unfiltered patient records

A hospital deploys a clinical decision support tool that embeds every patient record in a vector database. The embedding API calls transmit full records. A security researcher with legitimate access to the embedding service discovers that responses to crafted queries can be used to reconstruct patient data. The records never should have been embedded verbatim — only relevant clinical facts, with identifying information removed.

Scenario 3 — Prompt injection via patient-controlled data

A patient submits a support message that includes carefully crafted text designed to manipulate the AI’s behavior. The support AI reads patient data from the EHR as context. The injected instructions cause the AI to include data from other patients in its response. No BAA clause covers prompt injection — it’s an architectural problem that paperwork can’t solve.

The technical fix: prevention before transmission

The correct architectural approach is to treat PHI interception as an infrastructure problem, not a legal one.

Before any data reaches an external AI provider, it should pass through a layer that:

Identifies all PHI/PII using pattern matching and entity recognition — names, dates, IDs, locations, medical terms tied to individuals
Replaces identified entities with reversible tokens — tokens that contain no PHI but allow the model to reason about relationships (“the patient in [PERSON_1]‘s records has the same condition as [PERSON_2]”)
Maintains the mapping securely — in your infrastructure, never transmitted to the provider
Rehydrates the response — when the model returns [PERSON_1], your system substitutes the actual name before showing it to the user

This is the privacy-by-architecture model. The BAA is still required — but it’s a secondary safeguard over a system where leakage is already architecturally prevented.

Conclusion: architectural controls, not paperwork

A BAA is table stakes. It’s required by law, and you should absolutely have one. But in 2026, “we have a BAA” as a complete answer to “how do you protect PHI in your AI system?” is not acceptable — not to regulators, not to enterprise buyers conducting security reviews, and not to patients whose data you’re responsible for.

The healthcare organizations building AI responsibly are thinking in terms of architectural controls: what PHI needs to reach the AI provider, and how do we technically prevent the rest from ever getting there?

If you’re ready to move from paperwork compliance to architectural compliance, start with Privedge.