Data Residency for AI: Where Did the Prompt Go?
Data residency commitments used to be a database question. The data sits in a region. Backups are in the region. Replicas are in the region. The control plane is in the region. The commitment is kept by confirming the deployment diagram matches the policy.
AI systems transmit slices of that regional data to model providers every time they run an inference. Most organizations cannot say with confidence where the provider processes the request, where logs of the request are stored, or which sub-processors are involved. The residency question has gotten significantly harder, and most policies have not been updated.
What actually happens to a prompt
A prompt assembled inside a region is transmitted to the model provider's API endpoint. The endpoint resolves to infrastructure in some region — possibly but not necessarily the region closest to the calling system. The request is processed by the provider's inference infrastructure, which may be multi-regional for availability reasons. Logs of the request are retained by the provider, for durations and in regions the provider specifies in their documentation. Sub-processors involved in the provider's operation may touch the data.
Each of these points is a residency question that used not to exist for the calling system. Each is now part of the data's journey, and each needs to be understood if the commitment the calling system made to its own customers is to be kept.
The providers have, to their credit, started publishing residency-relevant information: region selection for inference, data retention policies, sub-processor lists. The information is available. The work of reading it, mapping it to the customer commitments, and confirming the match is customer-side work that most customer-side compliance programs have not done yet.
The specific questions
For every AI system in scope of a residency commitment, answer these questions concretely:
- Which endpoint of the model provider does this system call?
- What region does that endpoint's infrastructure operate in?
- Is inference always performed in that region, or can it be routed to a different region for availability or capacity reasons?
- What logs does the provider retain of the request? Where?
- For how long?
- Are sub-processors involved, and if so, in which regions do they operate?
- If the provider breaches the commitment, what notification do you receive and in what timeframe?
For most organizations, answering this list for the first AI system in scope is a multi-day exercise. For the second and subsequent systems it gets faster, but the work is not zero. The list has to be answered for each provider, each model, and potentially each endpoint.
The control you probably need to add
The control that tends to be missing is a routing layer that binds system-side region to provider-side region explicitly. Without it, a regional deployment can silently call a global API endpoint that routes wherever the provider's infrastructure decides, and the residency commitment is kept or broken by factors outside the system's control.
With the routing layer — region-specific endpoints, region-specific model deployments, failure-closed on region mismatch — the residency commitment is an engineering property of the system, not a hopeful attribute. The control can be audited. The auditor can trace a prompt from application code to provider endpoint and confirm the region chain holds.
This is infrastructure work. It is the kind of work that gets skipped when an AI feature is shipped quickly and cleaned up later. The cleanup can be costly — rearchitecting the integration, negotiating new endpoints with the provider, validating that no existing production traffic is at risk of violating the commitment the policy says is being kept.
The broader point
Data residency is one of several compliance commitments that look straightforward until AI enters the picture and reveals the commitment as an aggregate of assumptions about where data travels. AI does not break the commitment in principle. It forces the commitment to be described with a precision that the pre-AI architecture never required.
The precision is achievable. The work to get there is the work of actually understanding what your AI system is doing, which is work worth doing regardless of the residency question.