A Framework for Risk Assessment of Advanced AI Systems
The rapid advancement of artificial intelligence introduces increasingly capable and versatile systems that handle an ever-broadening range of tasks and environments. As these general-purpose AI systems evolve, they introduce a wide array of potential harms to individuals, society, and our biosphere, whether from misuse or unintended behavior. This complexity, combined with the profusion of possibilities, often escapes traditional methods of evaluation, making a more comprehensive risk assessment framework crucial for analyzing how a given AI model or system interacts with, and enables, threats through which harms can be realized.
The Probabilistic Risk Assessment (PRA) for AI framework adapts established PRA techniques from high-reliability industries (e.g., aerospace, chemical manufacturing and nuclear power) to meet the unique challenges of quantifying risk from advanced AI systems. It provides structured guidance for threat modeling and risk assessment, pairing systematic scanning of potential hazards with time-tested probabilistic methods.
The framework builds on traditional PRA methodology and extends it with the following key methodological advances:
- Aspect-oriented hazard analysis. Provides a top-down approach for assessors to index the space and iterate over the landscape of hazards, covering the characteristic aspect categories of an AI system: the types of capabilities, domain knowledge, affordances, and impact domains salient to them. This taxonomy-driven method guides systematic analysis of how emerging AI capabilities could enable or amplify potential harms through critical bottlenecks.
- Risk pathway modeling. Guides modeling of the step-by-step progressions of risk from a system’s source aspects (specific types or intersections of capability, domain knowledge, or affordance) to terminal aspects (traceable downstream impact domains where harms to individuals, society, or the biosphere are manifested). It helps analyze how risks transmit and amplify.
- Prospective risk analysis. Offers analytical methods to ask the right questions and evaluate potential harms. It facilitates exploring novel failure modes, extrapolating capability trajectories, and integrating theoretical and empirical evidence.
- Uncertainty management. Provides structured approaches for addressing multiple layers of uncertainty through explicit methodological guidance, calibration points, comparative analysis tools, documentation requirements, mechanisms for incorporating diverse evidence sources, Delphi-inspired methods, and review procedures.
This integrated approach provides tooling to examine the question of how an advanced general-purpose or otherwise highly-capable AI system could cause harm to society, whether through misuse, loss of control, negative side-effects, systemic degradations, or misaligned actions. With it, teams can better systematically identify and analyze potential hazard pathways that might otherwise be overlooked or underestimated in traditional assessment.

The PRA for AI framework pairs systematic hazard identification with probabilistic methods. The framework includes reference materials for assessor calibration, classification heuristics for scenario development, and standardized though accommodative protocols for estimating risk levels. The accompanying workbook tool provides practical guidance for conducting thorough assessments. The research paper details the theoretical foundations and methodology.
Purpose and Scope
The increasing complexity and influence of AI systems necessitates a portfolio of systematic approaches to assess and manage their risks. The PRA for AI framework aims to contribute lessons from other domain areas to that portfolio, providing structured methods for analyzing both the known and potential sources of harm stemming from general-purpose AI systems. Its goals are to enable assessors to:
- Identify and assess risks posed by cutting-edge AI systems.
- Analyze both direct hazards from AI systems and downstream interactions that could lead to harm.
- Estimate likelihood and severity levels across a broad spectrum of potential hazards, with explicit consideration of risk pathways, propagation mechanisms, and interaction effects.
The framework guides assessors in producing unified quantified absolute risk estimates and documented rationales to support risk-informed decisions throughout the AI system lifecycle—including development, deployment, and governance. It enables assessors to evaluate system safety and acceptability against established thresholds.
Applicability
The framework provides a versatile approach to assessing risks across a wide spectrum of architectures, capabilities, and phases of the AI lifecycle. This framework also facilitates analysis of a broad range of threats to the world from current and future advanced AI.
Assessment Process
The framework offers a systematic, scalable process for identifying, analyzing, and assessing potential risks. Supported by a structured taxonomy and calibration tools, assessors complete the following steps:
- Select Complexity of Assessment
- Execute Assessment:
- Choose Next Aspect
- Generate Risk Scenarios
- Analyze Harm Severity Levels
- Determine Likelihood Levels
- Risk Levels Estimates Generated
- Review Report Card
Based on organizational requirements, stakeholder priorities, and system characteristics, assessors first select the appropriate complexity of their assessment. This can range from quick, focused discovery to extensive analyses of risk propagation and amplification across society.
During execution, assessors use the framework's Aspect-Oriented Taxonomy of AI Hazards to examine different aspects that could enable or amplify harm - whether through capabilities that overcome existing safety bottlenecks, knowledge that enables dangerous actions, or affordances that create new vectors for misuse. For each aspect, assessors document their analysis in the risk assessment entry log.
The framework’s tools and guidance support assessors as they move from aspect-level analysis to threat model development, and then to specific risk scenarios. Using bottleneck-based reasoning, assessors identify where AI systems could differentially enable harmful outcomes. Scenarios are then evaluated by determining harm severity and likelihood levels, deriving risk level estimates. The framework also provides guidance on integration of data sources and techniques for generating risk scenarios. Detailed instructions and supporting tools are available in the accompanying workbook page.
Depending on organizational needs and context, assessors may conduct additional analyses. They can examine how different aspects of the system might interact to create new risks (second order assessment), or analyze how risks could propagate through interconnected social and technical systems (propagation operator enhanced assessment). Figure 2 provides an overview of the process.

The assessment process generates risk level estimates and rationales, which are captured in a standardized report card. The risk report card shows the highest level of risk for each of the aspect groups, and optionally allows for the mapping of individual risk scenarios into focused aggregation dimensions, to display custom insights about assessments. The report card results are always reviewed alongside the tallied risk matrix and the documented scenarios in the risk assessment output log, which provide essential context for interpreting the aggregated risks and overall findings.
For detailed instructions on conducting assessments, consult the accompanying workbook (v0.9.1-alpha), which includes a detailed user guide. The paper detailing the PRA for AI framework is now available as a preprint.
Foundational Background
Traditional probabilistic risk assessment (PRA) often begins by assuming an initiating event. In manufacturing, this event might be the failure of a specific piece of equipment or component. In AI, this event might be “jailbreaking” and subsequent use of the AI by a hostile actor, fine-tuning that changes its behavior, a mistake or hallucination of important information, or a sudden jump in capabilities.
Impacts caused by such an event can be of several types:
- Direct results of the failure or event - e.g. a manufacturing shutdown; a deepfake campaign; an AI becoming harder to control, understand, or steer.
- Indirect results from a series of related events - e.g. increased wear and tear on nearby equipment; erosion of trust; structurally deleterious job loss; intentional subversion and manipulation by AI systems or by hostile actors prompting them.
- Cascading results involving secondary or simultaneous failures - e.g. a breaker fails to close, turning a short circuit into a much larger grid shutdown; a safety system fails, causing injury; an AI system gains multiple dangerous capabilities at once; an AI system gains the capacity to improve itself or design successor systems without human intervention.
Understanding and modeling how these causal pathways may lead to harmful results—and estimating the likelihood of each intermediate possibility—is an important part of risk assessment. In AI, these pathways are particularly complex due to the system’s evolving and often unpredictable nature. To help assessors track the various possible outcomes, event trees or decision trees are often used.

Risk assessment at its core involves identifying potential risks and the critical bottlenecks that could allow these risks to escalate into harmful outcomes. Bottlenecks may emerge as key failure points, vulnerabilities, or pathways through which misuse, unintended consequences, or emergent behaviors might proliferate. Traditional approaches such as fault tree analysis work backwards from potential harms to identify contributing factors, while forward-looking methods like event trees help map how initial events might progress through various paths. These complementary perspectives on risk pathways, drawn from established fields such as system safety, can provide useful mental models when thinking about AI risk scenarios—particularly given the complex, interconnected nature of AI system behaviors and their effects. Together, these techniques lead to more comprehensive scenario generation, addressing the unique challenges posed by AI systems.
A Framework for AI Probabilistic Risk Assessment
The PRA for AI framework uses aspect-oriented analysis to give structure to AI hazard modeling. It prompts to first focus on two key endpoints of the societal risk pathway: source aspects and terminal aspects, both in isolation and together. While full-fidelity causal pathways between these endpoints often remain uncertain or unknowable, examining forward or backward causal propagation of vulnerabilities and risks from both ends of potential harm chains, along with modular mechanisms of systemic and societal risk propagation, provides a means to valuable insight into potential risks.
Risk Pathway Modeling
Risk pathways connect these source aspects to terminal aspects. They consist of six fundamental elements:
- Source aspects. Source capabilities, domain knowledge, or affordances of the AI system that could initiate a risk pathway, and have the potential to cause harm.
- Source aspect-adjacent hazards. The specific hazards that emerge directly or causally soon after from source aspects of the AI system and are the initial points where system characteristics could enable or trigger harm pathways.
- Intermediate steps. States through which risks propagate, defining the sequence of transitions of risks from source to impact.
- Propagation operators. Mechanisms that characterize how risks transmit and transform (including, e.g. amplification) between or during the pathway steps as those risks percolate through societal systems, aiding mapping of how risks cascade into broader impacts.
- Terminal aspect-adjacent hazards. Vulnerabilities through which risks manifest as concrete harms to societal systems.
- Terminal aspects. Domains that are negatively impacted or impinged upon, where harms ultimately manifest, and the endpoints of risk pathways.
While terminal aspects represent the final element of the risk pathway, harms describe the specific negative outcomes actually realized within these domains when the pathway completes.
While an exhaustive mapping of the exact pathways between aspects and harms is not practical, understanding these pathway elements helps build a more complete picture of potential risks. This structured approach, while acknowledging inherent uncertainties, enables systematic scanning of both capability-adjacent and harm-adjacent hazards.
The framework guides assessors in this modeling through two complementary processes:
- Threat modeling within the framework involves identifying possible harm pathways for each AI system aspect—capabilities, knowledge domains, and operational affordances—against the societal threat landscape. This process considers where each aspect could 'unbottle' a threat or harm potential, and thus forms the basis of the threat model.
- Risk scenario generation translates these pathways into detailed scenarios that assessors can evaluate, illustrating how an AI capability or affordance might lead to real-world impacts on individuals, society, or the biosphere.
By examining hazards through this aspect-oriented lens, assessors can systematically map the connections between system characteristics and potential harms. This structured approach enables more comprehensive analysis of how AI system aspects could enable or amplify societal impacts. For a detailed breakdown of the taxonomy structure and its application in hazard identification, see the Taxonomy page.
Societal Threat Assessment
Traditional security assessments focus on protecting AI systems from external threats. In contrast, societal threat assessment examines how AI systems themselves could exploit or amplify vulnerabilities within society's critical structures and systems. This reorientation is essential given AI systems' potential to impact societal domains through both direct effects and cascade effects across interconnected structures.
The assessment considers a broad range of societal vulnerabilities—from the erosion of public trust and economic instability to the disruptions of ecological balance. Each vulnerability represents a potential pathway through which AI systems could generate harm:
- Harming societal structures directly
- Cascading effects through interconnected systems
- Amplifying existing vulnerabilities
- Creating new vulnerabilities
Societal threat assessment enables the systematic identification of potential harm pathways that might be overlooked by traditional security approaches focused on enterprise threats. A forthcoming paper, defining the societal threat landscape, will provide a deeper exploration of societal vulnerabilities and interconnected risks.
Prospective Risk Analysis
The analysis of potential future harms from AI systems requires structured methods suited to evaluating systemic vulnerabilities, unprecedented impacts, and complex societal interactions. As AI development accelerates, relying solely on hindsight—like focusing on a rearview mirror while driving—becomes inadequate. Predictive assessments are necessary to anticipate and address risks before they materialize.
Prospective risk analysis comprises three key principles:
- Systematic exploration: Structured approaches for identifying novel failure modes and interaction effects, and systematically searching for what might have been missed.
- Extrapolative analysis: Using available information to forecast capability trajectories and identify potential threshold effects.
- Evidence integration: Combining multiple sources of theoretical and empirical evidence to form prospective assessments.
When combined with other risk assessment methods, prospective analysis enables early identification and potential prevention of severe risks in increasingly capable AI systems. This approach provides crucial information to support development and deployment decisions, including evaluation of proposed safeguards, identification of areas requiring further testing, and go/no-go decisions.
Framework Status
Workbook (v0.9.1-alpha): An alpha version with functional components, released in November 2024. For more information on the workbook and release schedule, see the workbook page.
Research Paper: The preprint, released in April 2025, can be found on the paper page.
Contact Us
The PRA for AI framework is continuously evolving to address the rapidly advancing field of AI risk assessment. We value your insights and feedback to help refine the framework and ensure it remains as useful as it can be in AI safety practice. Whether you have questions, suggestions, or want to discuss customizing the framework for your specific needs, we encourage you to reach out to us via email:
PRAcarma
org
We appreciate your interest in the PRA for AI framework and workbook tool.
About CARMA
The Center for AI Risk Management & Alignment (CARMA) is a project of Social & Environmental Entrepreneurs, Inc., a 501(c)(3) nonprofit public charity, and is dedicated to managing risks associated with transformative artificial intelligence. Our mission is to lower the risks to humanity and the biosphere from transformative AI. CARMA's research addresses critical challenges in AI risk assessment and governance. We focus on methods for mapping and sizing of outsized risks from transformative AI, and conduct research on technical AI governance and public security measures.