Background

This tutorial is based on the insights and metrics presented in
AI Shall Have No Dominion: on How to Measure Technology
Dominance in AI-supported Human decision-making
by F. Cabitza, A. Campagner, R. Angius, C. Natali and C. Reverberi, part of the CHI'23 proceedings.

The Paper

The use of artificial intelligence (AI) systems for decision support and task automation has recently gained great popularity both in scientific and institutional contexts and in public opinion, reaching the level where it is considered natural, and almost a need, to adopt these systems even in areas where decisions have a legally relevant impact on those involved and the error rate or arbitrariness of decision-makers is considered unacceptable [1] (such as in medicine, court decisions, public security and safety, credit-worthiness). This interest and normalization are largely based on the frequently unstated presumption that the fewer mistakes an AI support makes, the better it is [2].

The appeal of this assumption is largely connected to its simplifying consequences: to decide whether an AI system is good enough for deployment in practical contexts it suffices to
evaluate its performance in isolation, or perhaps by comparing its performance with the
average performance of human decision-makers in the same task [3,4], without necessarily paying consideration to the complex socio-technical context [5] in which the system itself will be embedded after deployment or to the emergent phenomena coming from the continuous adjustment and fit between humans, machines and the tasks [6,7].

This dogma is appealing in theory, as it is of low applicability “in the wild” [8], as it is reasonable only in regard to the (still) small number of cases in which humans willingly adopt a fully automated decision-making setting and completely delegate decision-making to machines [9,1].

However, in the overwhelming majority of cases, the automation of classifying tasks is partial [10], and intended as a support to the human decision, i.e. to an act factually performed by a human being and for which they are solely responsible. In these contexts, that can be framed under the expression of hybrid decision making, the above-mentioned assumption is not only inapplicable but also harmful [11]; in all those cases, indeed, the evaluation of an AI system should be aimed at understanding its role in letting people either avoid or commit incorrect decisions, by factoring in both cognitive and socio-psychological determinants and effects [12,13]

Bibliography

[1] Kahneman D, Sibony O, Sunstein C. Noise. Glasgow (Scotland): HarperCollins UK; 2022.
[2] Cabitza F, Campagner A, Datteri E. To err is (only) human. Reflections on how to move from accuracy
to trust for medical AI. In: Exploring Innovation in a Digital World. Cham (Switzerland): Springer;
2021. p. 36-49.
[3] Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, et al. End-to-end lung cancer screen-
ing with three-dimensional deep learning on low-dose chest computed tomography. Nature medicine.
2019;25(6):954-61.
[4] McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, et al. International evalu-
ation of an AI system for breast cancer screening. Nature. 2020;577(7788):89-94.
[5] Elmore JG, Lee CI. Artificial Intelligence in Medical Imaging—Learning From Past Mistakes in Mam-
mography. JAMA Health Forum. 2022;3(2):e215207.
[6] Carroll JM, Rosson MB. Getting around the task-artifact cycle: How to make claims and design by
scenario. ACM Transactions on Information Systems (TOIS). 1992;10(2):181-212.
[7] Cabitza F, Fogli D, Piccinno A. Fostering participation and co-evolution in sentient multimedia systems.
Journal of Visual Languages & Computing. 2014;25(6):684-94.
[8] Katsikopoulos K, Simsek O, Buckmann M, Gigerenzer G. Classification in the Wild. Cambridge, MA
(USA): MIT Press; 2020.
[9] Araujo T, Helberger N, Kruikemeier S, De Vreese CH. In AI we trust? Perceptions about automated
decision-making by artificial intelligence. AI & SOCIETY. 2020;35(3):611-23.
[10] Parasuraman R, Sheridan TB, Wickens CD. A model for types and levels of human interaction with
automation. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.
2000;30(3):286-97.
[11] Cabitza F, Campagner A, Simone C. The need to move away from agential-AI: Empirical investigations,
useful concepts and open issues. International Journal of Human-Computer Studies. 2021;155:102696.
[12] Huo W, Zheng G, Yan J, Sun L, Han L. Interacting with medical artificial intelligence: Integrating
self-responsibility attribution, human–computer trust, and personality. Computers in Human Behavior.
2022;132:107253.
[13] Ma S, Lei Y, Wang X, Zheng C, Shi C, Yin M, et al.. Who Should I Trust: AI or Myself? Leveraging
Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision-Making;
2023.