Characteristics of Increasingly General-Purpose AI Risk
Advanced AI systems possess unprecedented leverage and unique characteristics that set them apart from traditional technologies:
- Global impact: Given their widespread direct adoption, broad proliferation, and unprecedented leverage, AI systems have the potential for widespread, even global, effects.
- Emergent behaviors: AI systems can develop behaviors and pursue objectives that emerge from, but were not explicitly encoded in, their training. These behaviors often lack historical precedent, making risk assessment through traditional pattern-matching difficult.
- Dual use threats: Malicious actors may attempt to exploit or manipulate AI models, or simply give them purposely harmful goals.
- Vulnerability to systemic failures: Many AI safety measures rely on centralized controls or specific constraints. If these are compromised—for instance, through unauthorized access to model weights or unrestricted internet access—the consequences could be severe and far-reaching.
- Rapid evolution: The field of AI is advancing at an unprecedented pace, requiring risk assessment methods that can keep up with and anticipate new developments. AI agents can show unexpected capability jumps and may gain the ability to replicate and improve themselves at rapid speeds.
- Loss of control: The speed at which unintended side effects develop, combined with the complexity of AI systems, often surpasses our capability to intervene effectively once a harmful chain of events is set in motion.
- Complex interactions: AI systems can interact with their environment, other AI systems, and humans in adaptive, self-reinforcing ways, often producing emergent behaviors and cascading risks that are difficult to foresee or contain.
As AI models advance in size and capability, they demonstrate an increasingly general breadth of competences, within which they enable superhuman speed and abilities. Empirical evidence sets a lower bound on the capabilities and dangers that AI models can pose: deepfake creation, specific vectors of cyber offense uplift (i.e., enhancement of cyber-attack strategies and effectiveness), and hallucination are examples of readily reproducible risks. Benchmarks and traditional evals are imperfect tools to measure these, but they get much of the way there.
But the risks posed by today's and tomorrow's models enter territory for which no precedent yet exists. Large problems lurk beyond the horizon of what can be tested in a few weeks. AI risk assessment must therefore draw more heavily on analytical models, theory, and expert testimony than most popular methods.
Need for AI Probabilistic Risk Assessment
The unique characteristics of AI systems demand specialized risk assessment methodologies. Key factors driving this need include:
- Rapid risk development: The rapid pace of development toward general-purpose AI systems, which may introduce significant hazards, necessitates specialized assessment procedures to ensure their safety.
- Limitations of current methods: Traditional risk assessment approaches do not adequately address the complexity and scale of AI risks. Many more parties need to think about risk management for AI as a differential threat assessment for the world.
- Prospective assessment: Methods must identify and model latent risks beyond those that can be easily elicited before they manifest, due to the scale of risks.
- Regulatory demand: Regulators require a broader and deeper mix of methods to evaluate AI systems and establish meaningful safety thresholds.
The application of well-established risk management, system safety engineering, and probabilistic risk assessment methodologies to general-purpose AI systems has been conspicuously absent, despite their proven effectiveness in other fields. This makes the urgency of developing a structured risk assessment method for AI urgent and crucial.
Comparison of Risk Assessment Methods
Current AI risk assessment methods vary in their approach and effectiveness. In particular, they vary across the following criteria for comprehensive threat surface analysis:
- A method can operate at different levels of detail. Fine-grained methods enable detailed analysis of specific risk components, while coarse-grained methods provide higher-level system assessment.
- A method can cover more or less of the societal threat surface, e.g., by number of threat models.
- A method can provide systematic guidance over the threat surface of AI risk — or, it can lack such guidance.
- A method can be more or less informed by a specific system's characteristics — for example, specific capabilities or features.
- A method can be a better or worse proxy for safety.
- A method can be more or less robust to mitigation failure. A robust method includes the possibility of mitigation failure, estimates its likelihood, and prices it in.
- A method can be more or less "objective." An objective method requires little or no subjective input — more example, a benchmark is an objective method.
- A method can support prospective analysis of a system's risk — that is, it can identify leading indicators of risk, rather than retrospective evaluation once a system is already developed or deployed.
- A method can consider harm severity in addition to likelihood, or only consider likelihood.
Figure 1 compares current evaluation methods across these criteria. Ratings were assigned based on expert analysis of each method's capabilities and limitations. High (H) indicates strong performance or comprehensive coverage of the criterion, Medium (M) indicates partial fulfillment, and Low (L) indicates minimal or no attention to the criterion.
Coverage
Guidance
System Property
to Safety
Mitigation Failure
Objectivity
Prospective Analyses
Harm Severity
(No Holdout)
(Private Holdout)
Policies
Assessment
Safety Audits
Safety Audits
Safety Audits
A comparative analysis of AI risk assessment methods demonstrates that no single approach excels across all criteria, highlighting the need for a multi-faceted approach. In particular, many current methods struggle with threat surface coverage and robustness to mitigation failure. However, probabilistic risk assessment shows the most balanced performance across dimensions. A forthcoming paper, defining the societal threat surface, will provide a more detailed explanation of risk assessment gaps and how available methods compare.
Benefits of the PRA for AI Framework
The Probabilistic Risk Assessment (PRA) AI framework offers several key benefits that address the unique challenges of AI risk assessment:
- Variable-resolution analysis: The framework supports assessments scaling from quick high-level risk scans to comprehensive technical analysis of specific risk pathways, with clear documentation of scenarios and reasoning at each level.
- Comprehensive system aspect coverage: The framework uses a first principles top-down Taxonomy of Aspect-Oriented AI Hazards that covers AI Capabilities, Knowledge Domains, operational Affordances, and sociotechnical Impact Domains. It captures first and second order risks, considering effects on individuals, society, and the biosphere.
- Risk coverage: The framework offers a top-down perspective on the hazards from general purpose AI systems. This includes a structured approach to examining both competence-based and incompetence-based hazards across the AI system's capabilities, high-risk knowledge domains and operational affordances.
- Consideration of direct and systemic risks: The framework allows for the assessment of both immediate, direct harms and more subtle, long-term systemic risks.
- Explicit assumptions and likelihood documentation: The framework prompts assessors to make their detailed assumptions, reasoning and estimates explicit. This transparency enhances the reliability and reproducibility of the assessment process.
- Integration with assessment methods: The framework unifies and incorporates diverse assessment approaches within a structured evaluation framework. This allows assessment teams to incorporate existing safety cases, benchmarking results, test data and other risk assessment methods alongside or within the PRA methodology.
Framework Limitations
There are some weaknesses in the PRA for AI framework:
- Subjectivity in probability estimates: The framework requires careful calibration of probability estimates across assessors, particularly for novel scenarios without historical precedent. This could lead to inconsistent assessments.
- Rapid AI advancement challenges: The fast-paced evolution of AI capabilities may outpace the assessment process, making some evaluations quickly outdated.
- Unknown unknowns: PRA may not adequately capture unforeseen risks or emergent behaviors in advanced AI systems.
- Potential for anchoring bias: Predefined categories and examples might inadvertently anchor assessors' thinking, limiting consideration of unconventional scenarios.
- Resource intensity: Thorough PRAs require significant time and expertise, which may be challenging for smaller organizations or rapid development cycles.
For a detailed methodology of the PRA framework, please refer to our forthcoming paper.