Self-host or managed cloud?
Pick what fits your team.
Four steps.
Your worker, live.
Edge inference config.
Two variables.
Set PII_STRATEGY to control where inference runs. Add EDGE_MODEL to choose the Workers AI model.
High-volume tasks, low latency. Best choice for classification and extraction.
General-purpose Q&A and summarization. Good quality-cost trade-off.
Complex reasoning, equivalent to GPT-3.5. Highest quality on the edge.
[vars] PII_STRATEGY = "edge" # "anonymize" | "edge" EDGE_MODEL = "@cf/meta/llama-3.3-70b-instruct-fp8-fast" # Workers AI model
Three facts.
Not promises.
The PII substitution map is created at request time inside the isolate heap and destroyed with it. It is never written to any storage — KV, R2, or disk.
When PII_STRATEGY=edge, Cloudflare Workers AI runs on the same node that received your request. No cloud API call, no external network hop, no third-party model.
The worker code is MIT open source. You can audit, fork, modify, and redistribute it without restriction. No black box, no trust required.