PRA Framework

The Need for PRA in AI

Characteristics of Increasingly General-Purpose AI Risk

Advanced AI systems possess unprecedented leverage and unique characteristics that set them apart from traditional technologies:

  1. Global impact: Given their widespread direct adoption, broad proliferation, and unprecedented leverage, AI systems have the potential for widespread, even global, effects.
  2. Emergent behaviors: AI systems can develop behaviors and pursue objectives that emerge from, but were not explicitly encoded in, their training. These behaviors often lack historical precedent, making risk assessment through traditional pattern-matching difficult.
  3. Dual use threats: Malicious actors may attempt to exploit or manipulate AI models, or simply give them purposely harmful goals.
  4. Vulnerability to systemic failures: Many AI safety measures rely on centralized controls or specific constraints. If these are compromised—for instance, through unauthorized access to model weights or unrestricted internet access—the consequences could be severe and far-reaching.
  5. Rapid evolution: The field of AI is advancing at an unprecedented pace, requiring risk assessment methods that can keep up with and anticipate new developments. AI agents can show unexpected capability jumps and may gain the ability to replicate and improve themselves at rapid speeds.
  6. Loss of control: The speed at which unintended side effects develop, combined with the complexity of AI systems, often surpasses our capability to intervene effectively once a harmful chain of events is set in motion.
  7. Complex interactions: AI systems can interact with their environment, other AI systems, and humans in adaptive, self-reinforcing ways, often producing emergent behaviors and cascading risks that are difficult to foresee or contain.

As AI models advance in size and capability, they demonstrate an increasingly general breadth of competences, within which they enable superhuman speed and abilities. Empirical evidence sets a lower bound on the capabilities and dangers that AI models can pose: deepfake creation, specific vectors of cyber offense uplift (i.e., enhancement of cyber-attack strategies and effectiveness), and hallucination are examples of readily reproducible risks. Benchmarks and traditional evals are imperfect tools to measure these, but they get much of the way there.

But the risks posed by today's and tomorrow's models enter territory for which no precedent yet exists. Large problems lurk beyond the horizon of what can be tested in a few weeks. AI risk assessment must therefore draw more heavily on analytical models, theory, and expert testimony than most popular methods.

Need for AI Probabilistic Risk Assessment

The unique characteristics of AI systems demand specialized risk assessment methodologies. Key factors driving this need include:

The application of well-established risk management, system safety engineering, and probabilistic risk assessment methodologies to general-purpose AI systems has been conspicuously absent, despite their proven effectiveness in other fields. This makes the urgency of developing a structured risk assessment method for AI urgent and crucial.

Comparison of Risk Assessment Methods

Current AI risk assessment methods vary in their approach and effectiveness. In particular, they vary across the following criteria for comprehensive threat surface analysis:

Figure 1 compares current evaluation methods across these criteria. Ratings were assigned based on expert analysis of each method's capabilities and limitations. High (H) indicates strong performance or comprehensive coverage of the criterion, Medium (M) indicates partial fulfillment, and Low (L) indicates minimal or no attention to the criterion.

Coverage & Depth
Methodological Robustness
Assessment Structure
Fine-grain
Threat Surface
Coverage
Threat Surface
Guidance
Guidance By
System Property
Good Proxy
to Safety
Robust to
Mitigation Failure
Enforces
Objectivity
Supports
Prospective Analyses
Considers
Harm Severity
Safety Benchmarks
(No Holdout)
High
Medium
Medium
Low
Low
Low
Low
Low
Low
Safety Benchmarks
(Private Holdout)
High
Medium
Medium
Low
Low
Low
High
Low
Low
Evals
High
Low
Low
Low
Medium
Medium
Medium
Low
Medium
Responsible Scaling
Policies
Low
Low
Medium
Low
Medium
Low
Low
Medium
Medium
Safety Cases
Medium
Medium
Medium
Medium
High
Low
Low
High
High
Probabilistic Risk
Assessment
Medium
High
High
High
High
High
Low
High
High
Typical Narrow AI
Safety Audits
Low
Low
Low
Low
Low
Low
Medium
Low
Medium
Deep Bespoke AI
Safety Audits
High
Low
Low
Low
High
Medium
Medium
Low
Medium
Scalable (AGI)
Safety Audits
High
Medium
Medium
Medium
High
Medium
Medium
Low
Medium
Figure 1: Comparison of Risk Assessment Methods. Ratings indicate the degree to which each method addresses or fulfills the given criterion

A comparative analysis of AI risk assessment methods demonstrates that no single approach excels across all criteria, highlighting the need for a multi-faceted approach. In particular, many current methods struggle with threat surface coverage and robustness to mitigation failure. However, probabilistic risk assessment shows the most balanced performance across dimensions. A forthcoming paper, defining the societal threat surface, will provide a more detailed explanation of risk assessment gaps and how available methods compare.

Benefits of the PRA for AI Framework

The Probabilistic Risk Assessment (PRA) AI framework offers several key benefits that address the unique challenges of AI risk assessment:

Framework Limitations

There are some weaknesses in the PRA for AI framework:

For a detailed methodology of the PRA framework, please refer to our forthcoming paper.