Psychological Foundations¶
Status: Public specification – describes the psychological principles implemented in the proprietary ARF Core Engine.
The engine is access‑controlled and available under outcome‑based pricing.
Technical correctness alone does not produce trust.
ARF incorporates cognitive science principles to ensure that users understand, trust, and effectively oversee autonomous AI agents.
Canonical reference: See
core_concepts.mdfor definitions ofHealingIntent(explanations) and escalation gates (human‑in‑the‑loop).
1. Explainability¶
Every decision must answer three questions:
- What happened? – The action taken (approve, deny, escalate)
- Why did it happen? – The risk score, policy violations, and expected loss components
- What would change the decision? – Counterfactual insights (e.g., “if latency were lower, risk would be 0.3”)
Example explanation generated by the core engine:
“Risk increased because latency exceeded normal thresholds (350ms vs baseline 120ms) and historical incidents show 35% failure probability for this database category. Policy violations: none. Expected loss: approve=18.2, deny=12.7, escalate=11.5. Selected: ESCALATE due to epistemic uncertainty (0.45 > threshold 0.3).”
All explanations are returned in the justification field of HealingIntent.
2. Trust Calibration¶
Humans trust systems more when uncertainty is visible.
ARF displays:
- Credible intervals – 90% HDI around risk scores (see
mathematics.md) - Probability ranges – verbal labels aligned with numerical thresholds (e.g., “low risk”, “medium risk”, “high risk”)
- Confidence levels – derived from epistemic uncertainty (
confidence = 1 - ψ)
Visualisation in frontend:
- Risk score as a point estimate with error bars (HDI)
- Confidence as a progress bar or percentage
- Colour coding: green (low risk), yellow (medium), red (high), grey (uncertain)
3. Cognitive Load Reduction¶
Dashboards prioritise information to reduce cognitive load:
- Critical alerts – active policy violations, forced escalations, high uncertainty
- System status – current risk score, confidence, recommendation
- Historical trends – recent decisions, outcome success rates (optional, if temporal layer enabled)
Interactive drill‑down allows users to inspect detailed risk factors, epistemic breakdown, and decision traces without overwhelming the main view.
4. Human‑in‑the‑Loop¶
Escalations occur automatically when:
- Uncertainty interval width > threshold (e.g., HDI width > 0.3)
- Risk score falls into an escalation band (configurable via expected loss constants)
- Model confidence is low (
confidence < 0.5)
Escalation workflow:
- Core engine returns
ESCALATEinHealingIntent.action - Enterprise enforcement layer routes the request to a human reviewer (via Slack, Teams, or web UI)
- Reviewer examines risk factors, epistemic breakdown, and similar past incidents (if memory enabled)
- Reviewer approves, denies, or requests modifications
- The decision and reviewer justification are stored in the audit trail
Human overrides are fed back into the risk engine as outcomes (update_outcome) to improve future predictions.
5. Bias Mitigation¶
ARF incorporates bias mitigation strategies:
| Strategy | Implementation |
|---|---|
| Pessimistic priors | High‑impact categories (database, security) have priors biased toward higher risk (see mathematics.md) |
| Periodic bias audits | Log decisions with metadata (requester role, environment, resource type) – enterprise layer can run disparity analyses |
| Fairness constraints | Configurable policies can enforce equal treatment across user groups (e.g., same risk threshold for all roles) |
| Override tracking | All human overrides are logged, enabling detection of systematic bias in human decisions |
Bias audit reports can be generated from audit logs (enterprise feature).
6. Prospect Theory & Anchoring¶
To align with human decision‑making psychology, the frontend applies:
- Prospect theory – failures are weighted more heavily than successes in risk displays (e.g., “85% success rate” vs “15% failure rate” – failure rate is highlighted)
- Anchoring – risk scores are presented relative to a baseline (e.g., “30% higher than average for this category”)
- Cognitive dissonance – users who override a recommendation are asked to provide a justification, reinforcing trust and accountability
These are presentation‑layer choices, not core engine logic.
7. Implementation Notes (Proprietary)¶
The psychological principles described here are implemented in:
- Core engine – generates structured explanations (
justification), risk scores with uncertainty, and epistemic confidence - Enterprise enforcement layer – manages human‑in‑the‑loop escalation workflows and audit trails
- Frontend (
arf-frontend) – visualises risk, uncertainty, and explanations according to cognitive load principles
The core engine does not contain UI logic; all psychological presentation is in the frontend or enterprise integration layers.
8. References¶
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
- Lee, J. D., & See, K. A. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46(1), 50–80.
- Miller, T. (2019). Explanation in Artificial Intelligence: Insights from the Social Sciences. Artificial Intelligence, 267, 1–38.
- Norman, D. A. (1983). Design Rules Based on Cognitive Psychology. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.
9. See Also¶
core_concepts.md–HealingIntent.justificationand escalation gatesgovernance.md– governance loop flow and epistemic uncertaintymathematics.md– risk score and HDI calculationsdesign.md– architectural decisions for observability and auditability