Experts Urge Measurement of AI's Human Impact, Beyond Technical Metrics
As artificial intelligence systems rapidly advance, the focus of evaluation remains overwhelmingly on technical capabilities, such as reasoning tests and performance benchmarks. Imran Khan, who leads psychosocial AI evaluation at the Center for Humane Technology, argues that this narrow approach overlooks a critical metric: the impact of AI on human cognition, relationships, and behavior. Khan's recent essay highlights a paradox where immense effort is dedicated to measuring AI's prowess on abstract tasks, while the profound downstream effects on human well-being are largely unquantified.
Khan draws parallels to the social media debates, suggesting AI's influence could be even more pervasive and intimate. He points to early warning signs like teen suicides and AI psychosis, indicating that harms are already manifesting. While public scrutiny can prompt AI companies to make adjustments, Khan emphasizes the need for systematic measurement to inform such interventions. The long-term societal impacts on relationships, identity, and human connection, however, may become irreversible if not studied proactively.
Addressing the argument that users prioritize convenience, Khan explains that human desires are complex and often contradictory, seeking both immediate ease and long-term fulfillment. He stresses the importance of understanding what constitutes a healthy, long-term relationship with technology, rather than solely optimizing for momentary choices. Domains like companionship, child development, education, and crisis response are identified as particularly crucial for psychosocial measurement due to their potential vulnerability and formative nature. Khan advocates for long-horizon studies, akin to pharmaceutical post-market surveillance, to track emergent impacts over years, requiring greater data access for external researchers while preserving user privacy.
The current AI evaluation paradigm prioritizes technical proficiency over human well-being, creating a potential disconnect between system capabilities and societal outcomes. This focus on internal metrics, while driving rapid development, risks overlooking systemic risks that manifest over extended periods and across complex human interactions. The incentive structure within the AI industry appears geared towards competitive performance gains, potentially creating a first-mover disadvantage for any single entity that prioritizes costly, long-term human impact studies. Future governance frameworks may need to consider mandating independent, longitudinal studies of AI's psychosocial effects, similar to post-market surveillance in pharmaceuticals, to ensure technology development aligns with human flourishing rather than solely with computational efficiency or immediate user engagement.
AI-generated to prompt reflection — not editorial opinion, not advice, not a statement of fact. How this works.