Does SR 11-7 apply to foundation models accessed via API rather than internally built?

Yes. A 2021 joint statement from the OCC, Federal Reserve and FDIC reaffirmed that SR 11-7 applies to vendor-sourced models accessed via API. The bank using the model bears responsibility for validation and ongoing monitoring regardless of whether the model weights are hosted externally. Reliance on vendor-supplied documentation does not satisfy the independence requirement.

What counts as a material model change when a foundation model provider updates its weights without notifying the bank?

Under SR 11-7 principles, any change to model components that could affect outputs in a regulated use case is potentially material and should trigger a change management review. Banks should establish contractual notification rights with AI vendors and maintain baseline output benchmarks that allow automated detection of behavioral drift when undisclosed weight updates occur.

How should banks handle conceptual soundness validation when transformer model internals are not interpretable in the traditional sense?

Regulators are accepting a decomposed validation approach that assesses soundness separately at the base model layer, the fine-tuning or retrieval layer and the prompt engineering layer. Red-team adversarial testing documented against NIST AI RMF criteria supplements traditional validation procedures and supports conditional soundness findings scoped to specific operational boundaries.

Are prompt templates considered model artifacts that require version control and change management under SR 11-7?

Examiners are treating material prompt changes as model changes when they govern outputs in regulated use cases. Best practice in 2026 is to store production prompt templates in a configuration management system with versioning and to route prompt modifications through the same change management workflow applied to model parameter changes.

What expertise do bank model risk validators need to validate foundation model deployments compliantly?

SR 11-7 requires validators with expertise appropriate to the model under review. For foundation models, that includes working knowledge of transformer architectures, embedding-based evaluation metrics, prompt injection and adversarial robustness testing and differential fairness analysis on non-numeric outputs. Most traditional bank model risk teams require targeted upskilling or specialist augmentation to meet this requirement.

SR 11-7 Model Risk Management for Foundation Models

The Federal Reserve's SR 11-7 guidance on model risk management was finalized in 2011. The largest language model that existed publicly at that time had fewer parameters than a modern embedding layer. SR 11-7 was written for logistic regression credit scorecards, stress-testing models and loan loss reserve calculations. It was not written for GPT-class foundation models generating loan denial letters, flagging suspicious transaction narratives or powering customer-facing chat interfaces at federally supervised institutions.

That gap is now a live compliance problem. In 2026, banks deploying foundation models are doing so under a regulatory framework that was architected for a fundamentally different class of artifact. Examiners from the OCC, FDIC and Federal Reserve are walking into model risk management reviews expecting SR 11-7 compliant documentation for systems that resist the framework at nearly every technical layer. Understanding exactly where that friction lives is the first step toward resolving it.

What SR 11-7 Actually Says and Why It Still Governs

SR 11-7 defines a model as a quantitative method, system or approach that applies statistical, economic, financial or mathematical theories, techniques and assumptions to process input data into quantitative estimates. The guidance establishes three core functions: development and implementation, ongoing monitoring and outcomes analysis, and independent model validation.

The definition is broad enough that regulators have consistently held it applies to machine learning systems, including neural networks. In 2021, the OCC, Federal Reserve and FDIC issued a joint statement on model risk management for third-party model use that reaffirmed SR 11-7's applicability to vendor-sourced models. That statement explicitly covered models accessed via API, which captures every foundation model deployment where a bank calls OpenAI, Anthropic, Google or a similar provider.

The principle driving SR 11-7 is simple: banks must understand what their models do, why they do it and when they are wrong. Conceptual soundness, empirical validation and ongoing performance monitoring are the three pillars. None of them have been repealed. What has happened is that the technical assumptions embedded in those pillars have become untenable for transformer-based foundation models.

The Validation Gap Foundation Models Create

Traditional SR 11-7 validation works by examining model assumptions, testing outputs against out-of-sample data, calculating performance metrics like Gini coefficients, ROC-AUC and PSI, and verifying that the model's theoretical basis is sound. A credit scorecard validator can inspect every coefficient. A stress-testing model validator can trace every equation in the documentation to a published economic theory.

Foundation models break this workflow in at least four distinct ways.

First, opacity at scale. A model with hundreds of billions of parameters has no interpretable coefficient table. Attention weights exist but do not map cleanly to the causal reasoning regulators expect validators to describe. Techniques like integrated gradients and SHAP have been adapted for transformer architectures but they produce local approximations, not global interpretability. A validator signing off on conceptual soundness for an LLM is making a fundamentally different attestation than one signing off on a scorecard.

Second, training data opacity. When a bank builds a credit model internally, it knows exactly what data went into training. When a bank calls a foundation model API, it typically has no contractual or technical visibility into what the pre-training corpus contained. SR 11-7 requires documentation of data used in model development. For externally hosted foundation models, that documentation often does not exist and cannot be compelled.

Third, emergent behavior. Foundation models exhibit capabilities that were not explicitly trained and that emerge unpredictably as scale increases. A bank cannot validate a model against a fixed behavioral specification when the model's behavior is not fully bounded by its specification. SR 11-7 validation assumes the model does what its documentation says. Foundation models routinely do things their documentation does not anticipate.

Fourth, prompt sensitivity. The same underlying model weights produce radically different outputs depending on how a prompt is constructed. This means the unit of validation is arguably not the model but the prompt-plus-model combination. Most bank model inventories are not designed to version-control prompt templates as model artifacts.

What OCC and FDIC Examiners Are Actually Asking

Examination teams at the OCC and FDIC have been trained to look for SR 11-7 compliance artifacts: model inventory entries, validation reports, conceptual soundness documentation and ongoing monitoring evidence. For foundation models, those artifacts either do not exist or exist in a form that does not satisfy the traditional checklist.

Examiners are now asking a specific set of questions that banks should expect in any 2026 examination where AI is in scope.

Is the foundation model in the model inventory? Many banks are deploying foundation models as tools or workflows rather than classifying them as models, which creates an immediate SR 11-7 exposure. If the system produces quantitative estimates that influence a credit decision, a compliance flag or a customer communication in a regulated context, it is a model under SR 11-7 regardless of what the internal team calls it.

Who performed the independent validation? SR 11-7 requires that validation be conducted by individuals with appropriate expertise who are independent from the model development team. For foundation models sourced from hyperscalers, examiners are asking whether the bank performed its own validation or whether it is relying on the vendor's documentation. Vendor documentation does not satisfy the independence requirement.

What is the scope of use? Examiners distinguish between foundation models used in low-stakes internal workflows and those touching credit decisions, fraud disposition or consumer-facing outputs that fall under the Equal Credit Opportunity Act, the Fair Housing Act or the Consumer Financial Protection Act. Use-case scoping determines examination depth but does not eliminate SR 11-7 requirements for any use case.

How are prompt changes governed? This is the newest line of questioning and it catches banks off guard. If a prompt engineer modifies the system prompt governing how the model handles fraud alert narratives, has that change triggered a model change management review? Most banks have not yet built prompt versioning into their model change governance processes.

Redefining Conceptual Soundness for LLM Deployments

SR 11-7 requires that model validators assess conceptual soundness, which traditionally means verifying that the theoretical framework underlying the model is appropriate for the problem it is solving. For a survival analysis credit model, that means confirming the proportional hazards assumption holds. For an LLM, conceptual soundness requires a different analytic posture.

Banks that are handling this well are decomposing the foundation model deployment into layers and assessing soundness at each layer separately. The pre-trained base model is assessed for architectural fitness and known behavioral risks including hallucination rates and factual accuracy benchmarks. The fine-tuning or retrieval-augmented generation layer is assessed for domain alignment and data provenance. The prompt engineering layer is assessed for instruction clarity, adversarial robustness and output consistency.

Red-teaming has become a core validation technique. NIST's AI Risk Management Framework, published by the National Institute of Standards and Technology, provides a structured vocabulary for adversarial testing that bank validators are adopting to supplement traditional SR 11-7 procedures. Red-team exercises that document jailbreak attempts, prompt injection scenarios and out-of-distribution query responses generate the kind of empirical evidence validators need to support a conditional soundness finding.

The conditional soundness finding is itself an emerging practice. Rather than a binary pass/fail, validators are issuing findings that say the model is conceptually sound within a defined operational boundary, with specific conditions attached to use scope, input filtering and output review requirements. Examiners have accepted this framing when the conditions are specific and the monitoring regime enforces them.

Ongoing Monitoring When the Model Can Change Its Own Behavior

SR 11-7's ongoing monitoring requirement assumes a relatively stable model. You deploy a scorecard, you monitor its Gini coefficient against live performance quarterly and you trigger a review if performance degrades. Foundation models introduce three monitoring complications that do not exist for traditional models.

Model drift without retraining. Foundation model providers update base model weights on schedules that are not always disclosed to API customers. A bank using GPT-class model capabilities via API may be running on a different underlying model than it was six months ago without receiving a formal notification. SR 11-7 change management requirements apply to material model changes. Banks need contractual provisions requiring provider notification of weight updates and internal processes for evaluating whether such updates constitute material changes triggering re-validation.

Output monitoring for non-numeric outputs. Traditional model monitoring tracks numeric performance metrics. Foundation model outputs are often text. Monitoring text for compliance drift, demographic disparity and factual accuracy requires a different instrumentation stack. Banks are building automated output sampling pipelines using embedding similarity metrics and classifier-based compliance probes to generate the quantitative performance signals SR 11-7 monitoring reports require.

Human-in-the-loop scoring. Where foundation model outputs drive human decisions rather than automated decisions directly, monitoring must capture the decision quality of the human-model pair, not just the model in isolation. This is new territory for bank model risk functions and the methodology is still being developed across the industry.

Building a Compliant Governance Framework for Foundation Models

Banks that have navigated SR 11-7 examinations successfully with foundation models in their inventory share a set of structural practices. None of these practices require waiting for new regulatory guidance. They apply SR 11-7's existing principles to the technical realities of foundation model deployments.

Model inventory scoping policy. The bank maintains a written policy defining what constitutes a model under SR 11-7 that explicitly addresses foundation models, AI APIs and prompt-driven systems. The policy designates a responsible officer for scope determinations and creates an intake process for AI use cases before production deployment.

Tiered validation requirements. Not every foundation model deployment carries the same risk. A model generating internal research summaries carries different risk than one producing adverse action notices. The governance framework establishes validation tiers based on use case risk, with proportionally scaled validation requirements. This is consistent with SR 11-7's own acknowledgment that validation rigor should match model risk.

Prompt governance protocol. Prompt templates used in production model deployments are versioned, stored in a configuration management system and subject to change management review before modification. Material prompt changes trigger the same review workflow as model parameter changes. This directly addresses the examiner question about prompt governance and creates an audit trail.

Third-party AI vendor due diligence. The bank's vendor management program incorporates AI-specific due diligence requirements covering training data provenance disclosure, model card documentation, weight update notification obligations and audit rights. Where hyperscalers cannot satisfy these requirements contractually, the bank documents that gap as a model risk finding and implements compensating controls.

Validator training and capability building. SR 11-7 requires independent validators with appropriate expertise. Validating foundation models requires expertise in transformer architectures, prompt engineering, embedding-based evaluation and adversarial testing that most traditional bank model risk teams do not have. Building or acquiring that expertise is a prerequisite for compliant validation, not an optional enhancement.

The regulatory posture toward foundation models in banks is firming rapidly. The OCC's 2026 examination guidance for AI in banking makes explicit reference to SR 11-7 applicability and examiner teams are being trained accordingly. Banks that treat SR 11-7 compliance as a documentation exercise rather than a genuine risk management discipline will find examination exposure growing as examiner sophistication catches up to deployment velocity.

The institutions navigating this well are not waiting for a new SR letter. They are applying the principles SR 11-7 established with technical precision appropriate to the artifact they are actually deploying. That is exactly what the guidance has always required.

SR 11-7 Model Risk Management in the Age of Foundation Models: Validation Gaps and Examiner Expectations in 2026