Skip to content

Psychological Foundations

Status: Public specification – describes the psychological principles implemented in the proprietary ARF Core Engine.
The engine is access‑controlled and available under outcome‑based pricing.

Technical correctness alone does not produce trust.
ARF incorporates cognitive science principles to ensure that users understand, trust, and effectively oversee autonomous AI agents.

Canonical reference: See core_concepts.md for definitions of HealingIntent (explanations) and escalation gates (human‑in‑the‑loop).


1. Explainability

Every decision must answer three questions:

  1. What happened? – The action taken (approve, deny, escalate)
  2. Why did it happen? – The risk score, policy violations, and expected loss components
  3. What would change the decision? – Counterfactual insights (e.g., “if latency were lower, risk would be 0.3”)

Example explanation generated by the core engine:

“Risk increased because latency exceeded normal thresholds (350ms vs baseline 120ms) and historical incidents show 35% failure probability for this database category. Policy violations: none. Expected loss: approve=18.2, deny=12.7, escalate=11.5. Selected: ESCALATE due to epistemic uncertainty (0.45 > threshold 0.3).”

All explanations are returned in the justification field of HealingIntent.


2. Trust Calibration

Humans trust systems more when uncertainty is visible.
ARF displays:

  • Credible intervals – 90% HDI around risk scores (see mathematics.md)
  • Probability ranges – verbal labels aligned with numerical thresholds (e.g., “low risk”, “medium risk”, “high risk”)
  • Confidence levels – derived from epistemic uncertainty (confidence = 1 - ψ)

Visualisation in frontend:
- Risk score as a point estimate with error bars (HDI)
- Confidence as a progress bar or percentage
- Colour coding: green (low risk), yellow (medium), red (high), grey (uncertain)


3. Cognitive Load Reduction

Dashboards prioritise information to reduce cognitive load:

  1. Critical alerts – active policy violations, forced escalations, high uncertainty
  2. System status – current risk score, confidence, recommendation
  3. Historical trends – recent decisions, outcome success rates (optional, if temporal layer enabled)

Interactive drill‑down allows users to inspect detailed risk factors, epistemic breakdown, and decision traces without overwhelming the main view.


4. Human‑in‑the‑Loop

Escalations occur automatically when:

  • Uncertainty interval width > threshold (e.g., HDI width > 0.3)
  • Risk score falls into an escalation band (configurable via expected loss constants)
  • Model confidence is low (confidence < 0.5)

Escalation workflow:

  1. Core engine returns ESCALATE in HealingIntent.action
  2. Enterprise enforcement layer routes the request to a human reviewer (via Slack, Teams, or web UI)
  3. Reviewer examines risk factors, epistemic breakdown, and similar past incidents (if memory enabled)
  4. Reviewer approves, denies, or requests modifications
  5. The decision and reviewer justification are stored in the audit trail

Human overrides are fed back into the risk engine as outcomes (update_outcome) to improve future predictions.


5. Bias Mitigation

ARF incorporates bias mitigation strategies:

Strategy Implementation
Pessimistic priors High‑impact categories (database, security) have priors biased toward higher risk (see mathematics.md)
Periodic bias audits Log decisions with metadata (requester role, environment, resource type) – enterprise layer can run disparity analyses
Fairness constraints Configurable policies can enforce equal treatment across user groups (e.g., same risk threshold for all roles)
Override tracking All human overrides are logged, enabling detection of systematic bias in human decisions

Bias audit reports can be generated from audit logs (enterprise feature).


6. Prospect Theory & Anchoring

To align with human decision‑making psychology, the frontend applies:

  • Prospect theory – failures are weighted more heavily than successes in risk displays (e.g., “85% success rate” vs “15% failure rate” – failure rate is highlighted)
  • Anchoring – risk scores are presented relative to a baseline (e.g., “30% higher than average for this category”)
  • Cognitive dissonance – users who override a recommendation are asked to provide a justification, reinforcing trust and accountability

These are presentation‑layer choices, not core engine logic.


7. Implementation Notes (Proprietary)

The psychological principles described here are implemented in:

  • Core engine – generates structured explanations (justification), risk scores with uncertainty, and epistemic confidence
  • Enterprise enforcement layer – manages human‑in‑the‑loop escalation workflows and audit trails
  • Frontend (arf-frontend) – visualises risk, uncertainty, and explanations according to cognitive load principles

The core engine does not contain UI logic; all psychological presentation is in the frontend or enterprise integration layers.


8. References

  • Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
  • Lee, J. D., & See, K. A. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46(1), 50–80.
  • Miller, T. (2019). Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence, 267, 1–38.
  • Norman, D. A. (1983). Design Rules Based on Cognitive Psychology. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.

9. See Also

  • core_concepts.mdHealingIntent.justification and escalation gates
  • governance.md – governance loop flow and epistemic uncertainty
  • mathematics.md – risk score and HDI calculations
  • design.md – architectural decisions for observability and auditability