PRA Framework

Probabilistic Risk Assessment for AI

Introduction to the PRA Framework

The rapid advancement of artificial intelligence is introducing increasingly capable and versatile systems that can handle an ever-broadening range of tasks and environments. As these general-purpose AI systems evolve, their complex influence pathways — from their misuse to immediate misaligned actions — introduce a wide array of potential harms to individuals, society, and the biosphere. This complexity, combined with the profusion of possibilities, often escapes traditional methods of evaluation, making a more comprehensive, prospective risk assessment framework crucial for covering the entire threat surface of a given AI model or system.

The Probabilistic Risk Assessment (PRA) framework addresses this need by combining:

  1. Traditional PRA methodology: Draws from established quantitative risk assessment approaches used in high-reliability industries.
  2. Prospective risk analysis: Applies threat modeling to supplement traditional PRA with low-incidence risks and analytical methods to evaluate potential harms before they manifest, recognizing that the magnitude of AI risks requires systematic analysis of future threats from the AI system.
  3. Aspect-oriented hazard modeling: Examines via a structured approach the aspect-adjacent hazards (stemming from capabilities, domain knowledge, affordances) and harm-adjacent hazards at either end, with systemic risk propagation operators covering the crucial middle ground that connects them.
  4. Societal threat assessment: Analyzes how threats from AI systems could exploit or amplify societal vulnerabilities, distinct from traditional enterprise security concerns about protecting AI systems themselves.

This integrated approach examines how an increasingly capable AI system could cause harm to society, whether through misuse, loss of control, negative side-effects, systemic degradations, or misaligned actions. The framework specifically addresses threats from any given AI system to society, focusing on pathways of harm that exploit societal vulnerabilities. Teams can systematically identify and analyze potential hazard pathways that might otherwise be overlooked in traditional assessment approaches, whether they are direct threats (like misaligned AI actions) or indirect threats (like societal destabilization through economic disruption).

PRA Framework Risk Assessment Flowchart
Figure 1: An overview of the Probabilistic Risk Assessment (PRA) framework.

The PRA framework provides structured guidance for threat modeling and risk assessment of general-purpose AI systems. It pairs a structured taxonomy for systematic threat scanning with probabilistic risk assessment methods adapted from high-reliability industries. To enable methodical assessments, it provides reference materials for calibration, as well as guidance for scenario development and for estimating risk levels. The accompanying workbook tool provides both conceptual foundations and practical guidance for conducting thorough assessments, with a forthcoming research paper to detail more theoretical foundations.

Purpose and Scope

AI systems' increasing complexity and influence necessitates a portfolio of systematic approaches to assess and manage their risks. The PRA framework aims to contribute lateral lessons from other domain areas to that portfolio, providing structured methods for analyzing the potential and realized sources of harm associated with general-purpose AI systems. Its primary goals are to:

  1. Guide the identification and assessment of risks posed by cutting-edge AI systems to prevent societal harm.
  2. Offer a structured approach for analysis of both direct hazards from AI systems and their complex interactions that could lead to harm.
  3. Help assessors estimate likelihood levels across a broad spectrum of potential hazards, with accommodation for potential risk pathways and second-order effects.

The framework provides gradings and insights to support decision-makers in their evaluation of AI system safety and acceptability.

Applicability

The PRA framework provides a versatile approach to assessing risks from advanced and general-purpose AI systems across a wide spectrum of architectures, capabilities, and phases of the AI lifecycle. This framework also facilitates analysis of a broad range of threats to the world from current and future advanced AI.

Assessment Process

The framework offers a systematic, scalable process for identifying, analyzing, and assessing potential risks. Supported by a structured taxonomy and calibration tools, assessors complete the following steps:

  1. Select Complexity of Assessment
  2. Execute Assessment:
    1. Choose Next Aspect
    2. Generate Risk Scenarios
    3. Analyze Harm Severity Levels
    4. Determine Likelihood Levels
    5. Risk Levels Estimates Generated
  3. Review Report Card

Based on organizational requirements, stakeholder priorities, and system characteristics, assessors first select the appropriate complexity of their assessment. This can range from quick, focused discovery to extensive analyses of risk propagation and amplification across society.

During execution, assessors use the framework's taxonomy of AI hazards to examine different aspects that could enable or amplify harm - whether through pathways that overcome existing safety bottlenecks, knowledge that enables dangerous actions, or affordances that create new vectors for misuse.

The framework's tools and guidance support assessors as they move from aspect-level analysis to threat model development, and then to specific risk scenarios. Using bottleneck-based reasoning, assessors identify where AI systems could differentially enable harmful outcomes. Scenarios are then evaluated by determining harm severity and likelihood levels, deriving risk level estimates. The framework also provides guidance on integration of data sources and techniques for generating risk scenarios. Detailed instructions and supporting tools are available in the accompanying Workbook page.

For each scenario, assessors detail the most plausible paths to potential harms. The framework's components guide them in evaluating both the severity of potential harms and how likely they are to occur.

Depending on organizational needs and context, assessors may conduct additional analyses. They can examine how different aspects of the system might interact to create new risks (second order assessment), or analyze how risks could propagate through interconnected social and technical systems (pathway operator enhanced assessment). Figure 2 provides an overview of the process.

PRA Framework Assessment Process Flowchart
Figure 2: An overview of the risk assessment process flow in PRA.

The assessment process generates risk level estimates and rationales, which are captured in a standardized report card. The risk report card shows the highest level of risk for each of the aspect groups, and optionally allows for the mapping of individual risk scenarios into focused aggregation dimensions, to display custom insights about assessments. The report card results are always reviewed alongside the risk levels table and the documented scenarios, which provide essential context for interpreting the aggregated risks and overall findings.

For detailed instructions on conducting assessments, consult the accompanying workbook (v0.9.1-alpha), which includes a detailed user guide. During Q1 2025, three forthcoming papers will further explore applying PRA for AI, defining the societal threat surface, and comparing gaps in available AI risk assessment methods.

Foundational Background

Traditional probabilistic risk assessment (PRA) often begins by assuming an initiating event. In manufacturing, this event might be the failure of a specific piece of equipment or component. In AI, this event might be "jailbreaking" and subsequent use of the AI by a hostile actor, fine-tuning that changes its behavior, a mistake or hallucination of important information, or a sudden jump in capabilities.

Impacts caused by such an event can be of several types:

Understanding and modeling how these causal pathways may lead to harmful results - and estimating the likelihood of each intermediate possibility - is an important part of risk assessment. In AI, these pathways are particularly complex due to the system's evolving and often unpredictable nature. To help assessors track the various possible outcomes, event trees or decision trees are often used.

An event tree of a jailbreak attack.
Figure 3: An event tree of an example jailbreak attack.

Risk assessment at its core involves identifying potential risks and the critical bottlenecks that could allow these risks to escalate into harmful outcomes. Bottlenecks may emerge as key failure points, vulnerabilities, or pathways through which misuse, unintended consequences, or emergent behaviors might proliferate. Traditional approaches like fault tree analysis work backwards from potential harms to identify contributing factors, while forward-looking methods like event trees help map how initial events might progress through various paths. These complementary perspectives on risk pathways, drawn from established fields like system safety, can provide useful mental models when thinking about AI risk scenarios - particularly given the complex, interconnected nature of AI system behaviors and their effects. Together, these techniques lead to more comprehensive scenario generation, addressing the unique challenges posed by AI systems.

Prospective Risk Analysis

The analysis of potential future harms from AI systems requires structured methods suited to evaluating systemic vulnerabilities, unprecedented impacts, and complex societal interactions. As AI development accelerates, relying solely on hindsight—like focusing on a rearview mirror while driving—becomes inadequate. Predictive assessments are necessary to anticipate and address risks before they materialize.

Key components of prospective risk analysis include:

  1. Bottleneck analysis: Identifying critical points where system failures, emergent behaviors, or misuse could transition from manageable issues to cascading risks. This analysis helps assessors prioritize areas that could unlock or amplify harm potential by mapping key barriers that, if compromised, could enable rapid escalation of harm through societal systems.
  2. Whitebox property observation: Systematic study of a system's internal structure and dynamics across different scales, from its fundamental computational building blocks to its emergent behaviors during development, including the use of simplified analogs to understand behavioral patterns and risks.
  3. Combinatorial analysis: Systematic examination of how different aspects interact and amplify each other across scales to identify emergent risks not visible when aspects are evaluated independently.

When combined with other risk assessment methods, prospective analysis enables early identification and potential prevention of severe risks in increasingly capable AI systems. This approach provides crucial information to support development and deployment decisions, including evaluation of proposed safeguards, identification of areas requiring further testing, and go/no-go decisions.

Aspect-Oriented Hazard Modeling

The PRA framework structures AI hazard modeling through aspect-oriented analysis, focusing on two key endpoints of the societal risk pathway: system aspects and potential impact domains. While complete causal pathways between these endpoints often remain uncertain or unknowable, examining both ends of potential harm chains - along with the mechanisms of systemic risk propagation - provides valuable insight into potential risks.

At one end, the framework examines system aspects:

At the other end, it considers potential impact domains such as:

Between these endpoints lie complex systemic risk propagation operators - the mechanisms through which AI capabilities, domain knowledge, and affordances could spread and amplify harm throughout interconnected social and technical systems. While we cannot fully map the exact pathways between aspects and harms, understanding these three components (aspects, operators, and impacts) helps build a more complete picture of potential risks. This structured approach, while acknowledging inherent uncertainties, enables systematic scanning of both aspect-adjacent and harm-adjacent hazards.

The framework implements this modeling through two complementary processes:

By examining hazards through this aspect-oriented lens, assessors can systematically map the connections between system characteristics and potential harms. This structured approach enables more comprehensive analysis of how AI system aspects could enable or amplify societal impacts through various propagation mechanisms. For a detailed breakdown of the taxonomy structure and its application in hazard identification, see the Taxonomy page.

Societal Threat Assessment

Traditional security assessments focus on protecting AI systems from external threats. In contrast, societal threat assessment examines how AI systems themselves could exploit or amplify vulnerabilities within society's critical structures and systems. This reorientation is essential given AI systems' potential to impact societal domains through both direct effects and cascade effects across interconnected structures.

The assessment considers a broad range of societal vulnerabilities — from the erosion of public trust and economic instability to the disruptions of ecological balance. Each vulnerability represents a potential pathway through which AI systems could generate harm. These pathways can manifest through:

Societal threat assessment enables the systematic identification of potential harm pathways that might be overlooked by traditional security approaches focused on enterprise threats. A forthcoming paper, defining the societal threat surface, will provide a deeper exploration of societal vulnerabilities and interconnected risks.

Current State of the Framework

Workbook (v0.9.1-alpha): An alpha version with functional components, released in November 2024. For more information on the workbook and release schedule, see the workbook page

Research Paper: Scheduled for release in Q1 2025.

Contact Us

The PRA framework is continuously evolving to address the rapidly advancing field of AI risk assessment. We value your insights and feedback to help refine our framework and ensure it remains as useful as it can be in AI safety practice. Whether you have questions, suggestions, or want to discuss customizing the framework for your specific needs, we encourage you to reach out to us via email:

PRA--carma-org

We appreciate your interest in the PRA for AI framework.

About CARMA

The Center for AI Risk Management & Alignment (CARMA) is a project of Social & Environmental Entrepreneurs, Inc., a 501(c)(3) nonprofit public charity, and is dedicated to managing risks associated with transformative artificial intelligence. Our mission is to lower the risks to humanity and the biosphere from transformative AI. CARMA's research addresses critical challenges in AI risk assessment and governance. We focus on methods for mapping and sizing of outsized risks from transformative AI, and conduct research on technical AI governance and public security measures.