Building secure AI for your organisation: regulations, architecture, and data residency

As AI adoption accelerates, organisations face a critical challenge: how do you harness the power of large language models without compromising data security, regulatory compliance, or customer trust? The answer isn't one-size-fits-all. Depending on your industry, data sensitivity, and regulatory obligations, the right architecture might range from a managed cloud platform to a fully isolated private cloud. This article walks through the regulatory landscape, practical deployment patterns, and the security considerations every organisation should address before putting AI into production.

The AI security decision landscape

Regulatory compliance GDPR, ISO 27001, SOC 2, HIPAA, AI Act

Data residency Where data is stored, processed, and retained

LLM security Prompt injection, data leakage, model access

Architecture choices Managed SaaS, VPC, on-premises, hybrid

The four pillars of secure AI adoption — each influences the others, and all must be addressed together.

The regulatory landscape for AI

Organisations building AI systems don't operate in a regulatory vacuum. A growing number of frameworks govern how data — particularly personal data — can be collected, processed, stored, and shared. Understanding these frameworks is the starting point for any secure AI deployment.

GDPR (General Data Protection Regulation)

The EU's GDPR remains the gold standard for data protection. For AI systems, GDPR imposes specific obligations: personal data used for training or inference must have a lawful basis, individuals have the right to explanation of automated decisions, and data minimisation principles apply — you shouldn't feed an LLM more personal data than is strictly necessary. Cross-border data transfers are tightly controlled, which has direct implications for where your AI infrastructure lives.

EU AI Act

The EU AI Act, which came into force in 2024, classifies AI systems by risk level. High-risk systems (including those used in employment, credit scoring, and critical infrastructure) face mandatory requirements: risk assessments, human oversight, transparency obligations, and technical documentation. Even general-purpose AI models must comply with transparency and copyright requirements. Organisations deploying AI in the EU need to understand where their use cases fall in this risk taxonomy.

ISO 27001 and ISO 42001

ISO 27001 is the international standard for information security management systems (ISMS). It provides a systematic approach to managing sensitive information, including risk assessment, access controls, and incident response. For AI deployments, ISO 27001 certification demonstrates that your organisation has robust security controls around the data flowing into and out of AI systems.

ISO 42001, published in 2023, is the first international standard specifically for AI management systems. It covers the entire AI lifecycle — from design through deployment and monitoring — and addresses bias, transparency, and accountability. Together, these standards provide a comprehensive framework for secure, responsible AI.

SOC 2

SOC 2 (Service Organisation Control 2) is particularly relevant when working with third-party AI services. It evaluates security, availability, processing integrity, confidentiality, and privacy controls. When selecting AI vendors — whether cloud LLM providers or platform services — SOC 2 Type II reports are essential due diligence, confirming that the vendor's controls have been independently audited over time.

HIPAA and sector-specific regulations

Healthcare organisations must comply with HIPAA when using AI on patient data. Financial services have their own frameworks (FCA, PRA in the UK; SEC, FINRA in the US). Legal and accounting firms face professional conduct rules that intersect with AI use. The common thread: sensitive data requires additional controls, and the consequences of non-compliance are severe.

Regulatory frameworks and what they govern

Framework

Data protection

AI-specific

Security controls

Cross-border

GDPR

EU AI Act

—

ISO 27001

—

ISO 42001

—

SOC 2

—

Primary focus ~ Partially addressed — Not covered

No single framework covers everything — organisations typically need a combination to achieve comprehensive AI governance.

Data residency: where your data lives matters

Data residency refers to the physical and legal location where data is stored and processed. For AI systems, this has specific implications: when you send a prompt to an LLM, where does that data go? Is it stored? Is it used for training? Does it cross jurisdictional boundaries?

Under GDPR, transferring personal data outside the EEA requires specific legal mechanisms — adequacy decisions, Standard Contractual Clauses (SCCs), or Binding Corporate Rules. The Schrems II ruling tightened these requirements further, making it critical to understand the full data flow of any AI system you deploy.

Data residency isn't just a European concern. Australia's Privacy Act, Canada's PIPEDA, Japan's APPI, and Brazil's LGPD all impose restrictions on cross-border data transfers. Even within the US, state-level regulations (California's CCPA/CPRA, Virginia's VCDPA) are creating a patchwork of requirements.

For organisations handling sensitive data, the question isn't just "which LLM is best?" — it's "which LLM can we legally and safely use, given where our data must reside?"

LLM security: threats and mitigations

Large language models introduce a novel attack surface. The OWASP Top 10 for LLM Applications identifies the most critical risks:

Prompt injection — Malicious inputs that cause the model to ignore its instructions and perform unintended actions. This includes direct injection (in user input) and indirect injection (in retrieved documents or data).
Data leakage — Models may inadvertently reveal sensitive information from their training data or from conversation context shared across sessions.
Insecure output handling — LLM outputs that are used in downstream systems (SQL queries, API calls, rendered HTML) without proper sanitisation can create injection vulnerabilities.
Training data poisoning — Attackers who can influence training data may embed backdoors or biases in the model.
Model theft — Proprietary models or fine-tuned weights may be extracted through systematic querying.
Excessive agency — Agents with too many permissions and insufficient oversight can take harmful actions autonomously.

Mitigating these risks requires a layered approach: input validation, output filtering, rate limiting, access controls, audit logging, and regular red-teaming. No single measure is sufficient — defence in depth is essential.

AI deployment architectures — from managed to isolated

Tier 1 Managed platform

Tier 2 Dedicated cloud

Tier 3 Private / isolated

Hosting

Vercel / Netlify

AWS / Azure / GCP

AWS VPC / Azure VNet

Database

Supabase / PlanetScale

RDS / Cloud SQL

Self-managed in VPC

LLM

OpenAI / Anthropic API

Azure OpenAI / Bedrock

Self-hosted (Llama, Mistral)

Encryption

AES-256 at rest, TLS 1.3 in transit (platform-managed keys)

AES-256 at rest, TLS in transit (customer-managed keys)

Full control — own KMS, HSM, encryption at every layer

Data residency

Regional deployment available (Vercel regions, Supabase AWS regions)

Full region control — deploy to specific AZ, data stays in tenant

Complete sovereignty — air-gapped or on-premises options

Multitenancy

Shared infra with logical isolation (RLS, API keys, project-level separation)

Dedicated accounts or subscriptions per tenant; network-level isolation

Fully isolated environments per tenant; no shared resources

Compliance

SOC 2 Type II, GDPR DPA, ISO 27001 (via platform)

SOC 2, ISO 27001, GDPR, PCI DSS — controls in your tenant

All of Tier 2 plus HIPAA, FedRAMP, classified workloads

Best for

SaaS products, customer-facing apps, startups, teams wanting strong security without managing infrastructure

Data sovereignty requirements, financial services, enterprise clients needing tenant-level isolation

Healthcare (HIPAA), defence, classified data, organisations requiring zero external data transfer

Managed security / faster delivery Self-managed security / more control

All tiers provide strong security — the difference is how much control you manage yourself versus delegating to the platform.

Choosing the right architecture in practice

Tier 1: managed cloud platform

Modern managed platforms provide significantly more security than many organisations realise. A typical production stack might include:

Vercel for hosting the application — built on AWS infrastructure, SOC 2 Type II certified, with automatic TLS encryption, DDoS protection, and edge deployment across global regions. Vercel encrypts data in transit (TLS 1.3) and at rest, provides role-based access control, and supports integration with enterprise identity providers via SAML SSO.
Supabase for the database and authentication layer — runs on AWS with PostgreSQL, providing encryption at rest (AES-256), encryption in transit, row-level security policies, built-in auth with MFA, and audit logging. Supabase is SOC 2 Type II compliant and offers regional deployments across AWS regions for data residency needs. Database backups are encrypted and access is controlled via API keys and JWT tokens.
OpenAI or Anthropic API for the LLM — pay-per-token, no infrastructure to manage, models updated automatically. Both providers offer data processing agreements and commit to not training on business API data.

This stack delivers enterprise-grade security with a fraction of the operational overhead. The underlying AWS infrastructure provides the same physical security, network isolation, and compliance certifications (ISO 27001, SOC 2, GDPR-ready) that organisations rely on when using AWS directly. Both Vercel and Supabase support regional deployments — you can pin your database to a specific AWS region (e.g. eu-west-2 for London) and configure Vercel's edge functions to run in specific geographies, giving you meaningful data residency control without managing infrastructure.

Multitenancy at this tier is typically handled through logical isolation: Supabase's row-level security (RLS) policies enforce tenant boundaries at the database level, API keys scope access per client, and application-level middleware enforces tenant context. For many SaaS products this provides robust isolation — each tenant's data is separated by policy, even though it shares underlying infrastructure. When clients require stronger guarantees, you can provision separate Supabase projects per tenant, each in its own isolated PostgreSQL instance.

Tier 2: dedicated cloud with controls

When you need full control over data residency, want LLM inference within your own cloud tenant, or have strict requirements about which infrastructure processes your data, dedicated cloud services provide that additional layer of ownership:

Azure OpenAI Service — Run OpenAI models within your Azure tenant, in a specific region (e.g. West Europe). Your data stays within Azure's infrastructure and is not sent to OpenAI. Microsoft's data processing agreement covers GDPR compliance.
Amazon Bedrock — Access Claude, Llama, and other models within your AWS account. Data is processed in your chosen region and encrypted with your keys. No data is used for model training.
Google Cloud Vertex AI — Similar regional deployment with data residency guarantees.

At this tier, you also implement: VPC peering or private endpoints so LLM traffic never traverses the public internet, customer-managed encryption keys (CMK), comprehensive IAM policies, and audit logging via CloudTrail or Azure Monitor.

Multitenancy at this tier moves to infrastructure-level isolation. Each tenant (or tenant group) can have its own AWS account or Azure subscription, with dedicated networking, encryption keys, and IAM boundaries. This is the model enterprise clients in financial services and professional services typically require — their data doesn't just have separate database rows, it lives in entirely separate cloud accounts with independent audit trails.

Tier 3: private VPC / fully isolated

For the most sensitive use cases — healthcare with patient data, defence applications, or organisations with strict data sovereignty requirements — the solution is a fully isolated environment:

AWS VPC or Azure VNet with no internet gateway. All traffic stays within the private network.
Self-hosted open-source models — Llama 3, Mistral, or Mixtral running on dedicated GPU instances (e.g. AWS p4d/p5 instances, Azure ND series).
Private container registries — Model weights and application code are deployed from internal registries, never pulled from public sources in production.
Air-gapped options — For classified environments, the entire stack can run on-premises with no cloud connectivity.

Multitenancy at this tier means fully isolated environments per tenant — separate VPCs, separate compute, separate storage, and often separate LLM instances. There are no shared resources between tenants. This is the standard for healthcare platforms handling PHI, defence contractors, and organisations processing classified data. Each tenant environment is effectively a standalone deployment.

The trade-off here is cost and complexity. Self-hosted models require GPU infrastructure, model management expertise, and ongoing maintenance. Per-tenant isolation multiplies that overhead. But for organisations that cannot — by law or by policy — share any infrastructure between clients, this is the only viable path.

Secure data flow: how a Tier 2 deployment works

User Browser / app

HTTPS / TLS 1.3

Application Auth, validation, logging

Private endpoint

LLM service Azure OpenAI / Bedrock

All components within VPC / VNet — no public internet traversal

1 Authentication — User identity verified via OAuth 2.0 / SAML

2 Input sanitisation — Prompts validated and PII stripped before LLM call

3 Audit logging — Every request/response logged with user identity and timestamp

4 Encryption — Data encrypted at rest (AES-256) and in transit (TLS 1.3)

A Tier 2 deployment keeps all data within the cloud provider's infrastructure. Private endpoints eliminate public internet exposure.

LLM provider data policies: what you need to know

When evaluating LLM providers, the critical questions are: Does the provider retain your prompts? Are they used for training? Where is data processed? Here's a summary of the major providers' enterprise policies:

OpenAI (API) — API data is not used for training by default. Data retained for 30 days for abuse monitoring, then deleted. Zero-retention available on request. Enterprise plans offer additional controls.
Anthropic (API) — API data is not used for training. Prompts retained for safety monitoring (typically 30 days). Enterprise agreements offer custom retention policies.
Azure OpenAI — Data stays within your Azure tenant. Not shared with OpenAI. Not used for training. Processed in your chosen Azure region. Customer-managed keys available.
Amazon Bedrock — Data processed in your AWS account and region. Not used for model training. Encrypted with your keys. Private VPC endpoints available.
Self-hosted (Llama, Mistral) — Complete control. No data leaves your infrastructure. You manage everything.

The key insight: the same model (e.g. GPT-4) can have very different data handling characteristics depending on how it's deployed. Using GPT-4 via Azure OpenAI in a European region is fundamentally different from using it via the public OpenAI API, even though the model capability is the same.

Practical recommendations

Based on our experience deploying secure AI systems across industries, here are our recommendations:

Classify your data first — Before choosing an architecture, understand what data will flow through the AI system. Not all data has the same sensitivity. A customer FAQ chatbot has different requirements from an HR assistant processing employee records.
Match architecture to risk — Don't over-engineer for low-risk use cases (it wastes time and money), and don't under-engineer for high-risk ones (it creates legal and reputational exposure). The tier model above provides a useful framework.
Implement defence in depth — No single control is sufficient. Combine network isolation, encryption, access controls, input/output filtering, rate limiting, and audit logging.
Plan for data residency from day one — Retrofitting data residency after deployment is expensive and disruptive. Choose your regions and providers with regulatory requirements in mind from the start.
Audit your LLM supply chain — Understand where your model weights come from, who hosts them, and what data they were trained on. For regulated industries, this provenance matters.
Build human oversight into agentic systems — As AI agents gain more autonomy, ensure there are checkpoints where humans review and approve high-impact actions.
Document everything — Regulators increasingly expect AI governance documentation. Maintain records of your risk assessments, data processing activities, model evaluations, and security controls.

Getting started

Building secure AI isn't about choosing between innovation and compliance — it's about finding the right architecture for your specific context. Whether that's a managed cloud platform with strong built-in security for most production workloads, a dedicated cloud deployment for data sovereignty and additional control, or a fully isolated VPC for the most sensitive use cases, the options are there.

The most important step is the first one: understanding your data, your obligations, and your risk tolerance. From there, the architecture follows naturally.

We help organisations navigate this landscape — from initial assessment through architecture design, implementation, and ongoing compliance. If you're exploring AI adoption and want to get the security right from the start, get in touch.