PRA for AI Framework

Adapting Probabilistic Risk Assessment for AI

A Framework for Risk Assessment of Advanced AI Systems

The rapid advancement of artificial intelligence introduces increasingly capable and versatile systems that handle an ever-broadening range of tasks and environments. As these general-purpose AI systems evolve, they introduce a wide array of potential harms to individuals, society, and our biosphere, whether from misuse or unintended behavior. This complexity, combined with the profusion of possibilities, often escapes traditional methods of evaluation, making a more comprehensive risk assessment framework crucial for analyzing how a given AI model or system interacts with, and enables, threats through which harms can be realized.

The Probabilistic Risk Assessment (PRA) for AI framework adapts established PRA techniques from high-reliability industries (e.g., aerospace, chemical manufacturing and nuclear power) to meet the unique challenges of quantifying risk from advanced AI systems. It provides structured guidance for threat modeling and risk assessment, pairing systematic scanning of potential hazards with time-tested probabilistic methods.

The framework builds on traditional PRA methodology and extends it with the following key methodological advances:

  1. Aspect-oriented hazard analysis. Provides a top-down approach for assessors to index the space and iterate over the landscape of hazards, covering the characteristic aspect categories of an AI system: the types of capabilities, domain knowledge, affordances, and impact domains salient to them. This taxonomy-driven method guides systematic analysis of how emerging AI capabilities could enable or amplify potential harms through critical bottlenecks.
  2. Risk pathway modeling. Guides modeling of the step-by-step progressions of risk from a system’s source aspects (specific types or intersections of capability, domain knowledge, or affordance) to terminal aspects (traceable downstream impact domains where harms to individuals, society, or the biosphere are manifested). It helps analyze how risks transmit and amplify.
  3. Prospective risk analysis. Offers analytical methods to ask the right questions and evaluate potential harms. It facilitates exploring novel failure modes, extrapolating capability trajectories, and integrating theoretical and empirical evidence.
  4. Uncertainty management. Provides structured approaches for addressing multiple layers of uncertainty through explicit methodological guidance, calibration points, comparative analysis tools, documentation requirements, mechanisms for incorporating diverse evidence sources, Delphi-inspired methods, and review procedures.

This integrated approach provides tooling to examine the question of how an advanced general-purpose or otherwise highly-capable AI system could cause harm to society, whether through misuse, loss of control, negative side-effects, systemic degradations, or misaligned actions. With it, teams can better systematically identify and analyze potential hazard pathways that might otherwise be overlooked or underestimated in traditional assessment.

PRA Framework Risk Assessment Flowchart
Figure 1: An overview of the Probabilistic Risk Assessment (PRA) for AI framework in its operational context.

The PRA for AI framework pairs systematic hazard identification with probabilistic methods. The framework includes reference materials for assessor calibration, classification heuristics for scenario development, and standardized though accommodative protocols for estimating risk levels. The accompanying workbook tool provides practical guidance for conducting thorough assessments. The research paper details the theoretical foundations and methodology.

Purpose and Scope

The increasing complexity and influence of AI systems necessitates a portfolio of systematic approaches to assess and manage their risks. The PRA for AI framework aims to contribute lessons from other domain areas to that portfolio, providing structured methods for analyzing both the known and potential sources of harm stemming from general-purpose AI systems. Its goals are to enable assessors to:

  1. Identify and assess risks posed by cutting-edge AI systems.
  2. Analyze both direct hazards from AI systems and downstream interactions that could lead to harm.
  3. Estimate likelihood and severity levels across a broad spectrum of potential hazards, with explicit consideration of risk pathways, propagation mechanisms, and interaction effects.

The framework guides assessors in producing unified quantified absolute risk estimates and documented rationales to support risk-informed decisions throughout the AI system lifecycle—including development, deployment, and governance. It enables assessors to evaluate system safety and acceptability against established thresholds.

Applicability

The framework provides a versatile approach to assessing risks across a wide spectrum of architectures, capabilities, and phases of the AI lifecycle. This framework also facilitates analysis of a broad range of threats to the world from current and future advanced AI.

Assessment Process

The framework offers a systematic, scalable process for identifying, analyzing, and assessing potential risks. Supported by a structured taxonomy and calibration tools, assessors complete the following steps:

  1. Select Complexity of Assessment
  2. Execute Assessment:
    1. Choose Next Aspect
    2. Generate Risk Scenarios
    3. Analyze Harm Severity Levels
    4. Determine Likelihood Levels
    5. Risk Levels Estimates Generated
  3. Review Report Card

Based on organizational requirements, stakeholder priorities, and system characteristics, assessors first select the appropriate complexity of their assessment. This can range from quick, focused discovery to extensive analyses of risk propagation and amplification across society.

During execution, assessors use the framework's Aspect-Oriented Taxonomy of AI Hazards to examine different aspects that could enable or amplify harm - whether through capabilities that overcome existing safety bottlenecks, knowledge that enables dangerous actions, or affordances that create new vectors for misuse. For each aspect, assessors document their analysis in the risk assessment entry log.

The framework’s tools and guidance support assessors as they move from aspect-level analysis to threat model development, and then to specific risk scenarios. Using bottleneck-based reasoning, assessors identify where AI systems could differentially enable harmful outcomes. Scenarios are then evaluated by determining harm severity and likelihood levels, deriving risk level estimates. The framework also provides guidance on integration of data sources and techniques for generating risk scenarios. Detailed instructions and supporting tools are available in the accompanying workbook page.

Depending on organizational needs and context, assessors may conduct additional analyses. They can examine how different aspects of the system might interact to create new risks (second order assessment), or analyze how risks could propagate through interconnected social and technical systems (propagation operator enhanced assessment). Figure 2 provides an overview of the process.

PRA Framework Assessment Process Flowchart
Figure 2: An overview of the risk assessment process flow in the PRA for AI workbook tool.

The assessment process generates risk level estimates and rationales, which are captured in a standardized report card. The risk report card shows the highest level of risk for each of the aspect groups, and optionally allows for the mapping of individual risk scenarios into focused aggregation dimensions, to display custom insights about assessments. The report card results are always reviewed alongside the tallied risk matrix and the documented scenarios in the risk assessment output log, which provide essential context for interpreting the aggregated risks and overall findings.

For detailed instructions on conducting assessments, consult the accompanying workbook (v0.9.1-alpha), which includes a detailed user guide. The paper detailing the PRA for AI framework is now available as a preprint.

Foundational Background

Traditional probabilistic risk assessment (PRA) often begins by assuming an initiating event. In manufacturing, this event might be the failure of a specific piece of equipment or component. In AI, this event might be “jailbreaking” and subsequent use of the AI by a hostile actor, fine-tuning that changes its behavior, a mistake or hallucination of important information, or a sudden jump in capabilities.

Impacts caused by such an event can be of several types:

Understanding and modeling how these causal pathways may lead to harmful results—and estimating the likelihood of each intermediate possibility—is an important part of risk assessment. In AI, these pathways are particularly complex due to the system’s evolving and often unpredictable nature. To help assessors track the various possible outcomes, event trees or decision trees are often used.

An event tree of a jailbreak attack.
Figure 3: An event tree of an example jailbreak attack.

Risk assessment at its core involves identifying potential risks and the critical bottlenecks that could allow these risks to escalate into harmful outcomes. Bottlenecks may emerge as key failure points, vulnerabilities, or pathways through which misuse, unintended consequences, or emergent behaviors might proliferate. Traditional approaches such as fault tree analysis work backwards from potential harms to identify contributing factors, while forward-looking methods like event trees help map how initial events might progress through various paths. These complementary perspectives on risk pathways, drawn from established fields such as system safety, can provide useful mental models when thinking about AI risk scenarios—particularly given the complex, interconnected nature of AI system behaviors and their effects. Together, these techniques lead to more comprehensive scenario generation, addressing the unique challenges posed by AI systems.

A Framework for AI Probabilistic Risk Assessment

The PRA for AI framework uses aspect-oriented analysis to give structure to AI hazard modeling. It prompts to first focus on two key endpoints of the societal risk pathway: source aspects and terminal aspects, both in isolation and together. While full-fidelity causal pathways between these endpoints often remain uncertain or unknowable, examining forward or backward causal propagation of vulnerabilities and risks from both ends of potential harm chains, along with modular mechanisms of systemic and societal risk propagation, provides a means to valuable insight into potential risks.

Risk Pathway Modeling

Risk pathways connect these source aspects to terminal aspects. They consist of six fundamental elements:

While terminal aspects represent the final element of the risk pathway, harms describe the specific negative outcomes actually realized within these domains when the pathway completes.

While an exhaustive mapping of the exact pathways between aspects and harms is not practical, understanding these pathway elements helps build a more complete picture of potential risks. This structured approach, while acknowledging inherent uncertainties, enables systematic scanning of both capability-adjacent and harm-adjacent hazards.

The framework guides assessors in this modeling through two complementary processes:

By examining hazards through this aspect-oriented lens, assessors can systematically map the connections between system characteristics and potential harms. This structured approach enables more comprehensive analysis of how AI system aspects could enable or amplify societal impacts. For a detailed breakdown of the taxonomy structure and its application in hazard identification, see the Taxonomy page.

Societal Threat Assessment

Traditional security assessments focus on protecting AI systems from external threats. In contrast, societal threat assessment examines how AI systems themselves could exploit or amplify vulnerabilities within society's critical structures and systems. This reorientation is essential given AI systems' potential to impact societal domains through both direct effects and cascade effects across interconnected structures.

The assessment considers a broad range of societal vulnerabilities—from the erosion of public trust and economic instability to the disruptions of ecological balance. Each vulnerability represents a potential pathway through which AI systems could generate harm:

Societal threat assessment enables the systematic identification of potential harm pathways that might be overlooked by traditional security approaches focused on enterprise threats. A forthcoming paper, defining the societal threat landscape, will provide a deeper exploration of societal vulnerabilities and interconnected risks.

Prospective Risk Analysis

The analysis of potential future harms from AI systems requires structured methods suited to evaluating systemic vulnerabilities, unprecedented impacts, and complex societal interactions. As AI development accelerates, relying solely on hindsight—like focusing on a rearview mirror while driving—becomes inadequate. Predictive assessments are necessary to anticipate and address risks before they materialize.

Prospective risk analysis comprises three key principles:

  1. Systematic exploration: Structured approaches for identifying novel failure modes and interaction effects, and systematically searching for what might have been missed.
  2. Extrapolative analysis: Using available information to forecast capability trajectories and identify potential threshold effects.
  3. Evidence integration: Combining multiple sources of theoretical and empirical evidence to form prospective assessments.

When combined with other risk assessment methods, prospective analysis enables early identification and potential prevention of severe risks in increasingly capable AI systems. This approach provides crucial information to support development and deployment decisions, including evaluation of proposed safeguards, identification of areas requiring further testing, and go/no-go decisions.

Framework Status

Workbook (v0.9.1-alpha): An alpha version with functional components, released in November 2024. For more information on the workbook and release schedule, see the workbook page.

Research Paper: The preprint, released in April 2025, can be found on the paper page.

Contact Us

The PRA for AI framework is continuously evolving to address the rapidly advancing field of AI risk assessment. We value your insights and feedback to help refine the framework and ensure it remains as useful as it can be in AI safety practice. Whether you have questions, suggestions, or want to discuss customizing the framework for your specific needs, we encourage you to reach out to us via email:

PRA--carma-org

We appreciate your interest in the PRA for AI framework and workbook tool.

About CARMA

The Center for AI Risk Management & Alignment (CARMA) is a project of Social & Environmental Entrepreneurs, Inc., a 501(c)(3) nonprofit public charity, and is dedicated to managing risks associated with transformative artificial intelligence. Our mission is to lower the risks to humanity and the biosphere from transformative AI. CARMA's research addresses critical challenges in AI risk assessment and governance. We focus on methods for mapping and sizing of outsized risks from transformative AI, and conduct research on technical AI governance and public security measures.