An exploration into the experimental nature of artificial intelligence must begin with a firm grasp of its foundational principles. Contemporary AI systems are distinguished from traditional software by their advanced learning capabilities. Rather than operating on a fixed set of pre-programmed instructions, these models are designed to adapt, infer patterns, and evolve through exposure to data. A predominant application of this functionality is in the domain of machine decision-making, which, at a technical level, can frequently be conceptualized as a complex classification problem.
In this paradigm, the AI system is tasked with assigning an input to a predefined category; for instance, classifying a financial transaction as "legitimate" or "fraudulent," a loan application as "approve" or "deny," or a candidate's resume as a "good fit" or "poor fit" for a particular role.
The use of these technologies is no longer theoretical; it is widespread and deeply integrated into critical societal functions. Key examples include:
- Teacher Evaluation: Systems that analyze student performance data to assess teacher effectiveness.
- Recruitment: AI tools that screen resumes and even conduct preliminary interviews to filter candidates.
- Predictive Justice: Algorithms used to predict an individual’s likelihood of re-offending, influencing bail and sentencing decisions.
This rapid and deep integration has precipitated a host of significant ethical challenges. The most prominent of these concerns revolve around the domains of individual privacy, the potential for systemic bias and discrimination, and the critical need for explainability, which is the ability to understand and interpret the reasoning behind an AI’s decision.
Defining and Managing AI-Related Risk
To effectively govern AI, one must first establish a clear understanding of the associated risks. The Society for Risk Analysis offers a concise yet comprehensive definition, characterizing AI-related risk as
"the possibility of an unfortunate occurrence associated with the development or deployment of artificial intelligence."
The deliberate breadth of this definition reflects the reality that AI-related risks are profoundly heterogeneous in both their origins and their potential impacts. As systematically cataloged by academic and research initiatives like the MIT AI Risk Repository, these risks emanate from a variety of sources. These include overt technical malfunctions or software bugs, inherent inaccuracies stemming from the probabilistic nature of machine learning models, and the deliberate misuse of the technology for malicious purposes such as generating disinformation. Furthermore, risks can also arise from unforeseen and emergent negative consequences, such as the potential for widespread cognitive deskilling due to over-reliance on AI assistants.
The impact domains of these risks are equally diverse, affecting nearly every facet of modern society. They range from infringements on the fundamental rights of individuals, such as the right to due process or equal treatment, to systemic threats against the stability of democracies through algorithmic manipulation. The consequences can also extend to the physical world, with potential negative effects on the environment and global security, creating a complex and interconnected web of potential harms.
The EU AI Act
In response to these complex challenges, regulatory bodies are actively working to establish frameworks for the responsible development and deployment of AI.
A landmark example of this effort is the European Union’s AI Act, which places a strong emphasis on risk management. Specifically, Article 9 of the Act mandates the establishment and maintenance of a robust risk management system for any system classified as “high-risk.”
Cite
- A risk management system shall be established, implemented, documented and maintained in relation to high-risk AI systems.
- The risk management system shall be understood as a continuous iterative process planned and run throughout the entire lifecycle of a high-risk AI system, requiring regular systematic review and updating. It shall comprise the following steps:
- the identification and analysis of the known and the reasonably foreseeable risks that the high-risk AI system can pose to health, safety or fundamental rights when the high-risk AI system is used in accordance with its intended purpose;
- the estimation and evaluation of the risks that may emerge when the high-risk AI system is used in accordance with its intended purpose, and under conditions of reasonably foreseeable misuse;
This framework is not envisioned as a static, one-time compliance check but rather as a continuous, iterative process that must be planned, implemented, and updated throughout the entire lifecycle of the AI system, from initial design to post-market surveillance.
This mandated process involves a dual-pronged approach to the identification and evaluation of potential harms.
| Step | Description | Scope of Analysis |
|---|---|---|
| Identification | The identification and analysis of known and reasonably foreseeable risks that the AI system could pose to health, safety, or fundamental rights. | Assumes the AI is being used in accordance with its intended purpose. |
| Estimation | The estimation and evaluation of risks that may emerge not only from intended use but also under conditions of reasonably foreseeable misuse. | Extends the analysis to include ways the system might be used improperly, but in a way that developers could plausibly predict. |
However, the practical and conceptual difficulty of defining the precise boundaries of what constitutes “reasonably foreseeable” remains a central challenge in the implementation of this regulatory framework, particularly for a technology as novel and dynamic as artificial intelligence.
The Flaw in Our Thinking: The Tuxedo Fallacy
The efficacy of traditional risk assessment methodologies is often compromised when applied to novel technologies, primarily because they tend to mischaracterize the fundamental nature of the uncertainty involved. This conceptual error is incisively captured by the “Tuxedo Fallacy,” a term articulated and explored in the 2009 academic paper, “From the casino to the jungle.” The fallacy is elucidated through a powerful metaphor that contrasts two distinct epistemic worlds: the predictable environment of a casino and the profound uncertainty of an unexplored jungle.
- The casino represents a realm of manageable risk, where potential outcomes, while uncertain, are governed by known and well-defined probabilities. In this closed system, the rules are fully understood, allowing for the precise calculation of odds, as exemplified by “idealized textbook cases” such as games of roulette or dice.
- In stark contrast, the jungle serves as an analogue for the deep uncertainty that characterizes real-world technological deployment. Navigating this environment involves confronting dangers that cannot be neatly quantified.
These dangers manifest in two distinct forms.
- There are known dangers with unknown probabilities, akin to an explorer who is aware of the existence of tigers and poisonous snakes but can only offer speculative guesses regarding the likelihood of an attack.
- The jungle contains unknown dangers—the possibility of encountering previously undiscovered species of insects or microorganisms whose very existence, let alone their capacity for harm, is entirely unforeseen. This latter category represents the “unknown unknowns” that defy probabilistic modeling.
The Tuxedo Fallacy, therefore, is the critical error of “treating all decisions as if they took place under epistemic conditions analogous to gambling at the roulette table”—that is, treating a complex “jungle” problem as a simple “casino” problem. The peril of this approach lies in its tendency to foster a dangerous illusion of control. By presuming that all potential harms can be calculated and managed through statistical models, we systematically neglect the unquantifiable, deep uncertainties that often pose the most significant and potentially catastrophic threats. Consequently, deploying a powerful new AI system into society is not an act comparable to placing a calculated bet in a casino; it is more accurately analogized to taking the first tentative step into an uncharted jungle.
The Control Dilemma
Even if we could successfully navigate the Tuxedo Fallacy and develop more robust methods for anticipating risks, we would still confront a fundamental paradox in the governance of technology. This challenge was famously articulated by David Collingridge in his 1980 work, The social control of technology. Collingridge’s central thesis posits that the effective social control of technology hinges upon the simultaneous fulfillment of two essential conditions:
- first, possessing knowledge that the technology might produce certain unwanted effects, and
- second, retaining the control or power to intervene and prevent those effects.
The dilemma of control, as Collingridge termed it, arises from the observation that these two conditions are rarely, if ever, met at the same time, existing instead in an inverse relationship throughout a technology’s lifecycle.
During the early phase, or infancy, of a technology’s development—such as the advent of the first automobiles—the capacity for control is high. The technology is not yet widespread, its infrastructure is minimal, and its design is malleable, making it relatively easy to modify, regulate, or even abandon. However, in this nascent stage, knowledge of its long-term societal and environmental consequences is extremely low. The profound impacts of a car-centric society, such as urban sprawl, climate change, and geopolitical conflicts over oil, could not have been reasonably foreseen. Thus, a temporal mismatch occurs: we possess the power to act, but we lack the foresight to know which actions are necessary.
Conversely, in the late phase, when a technology has reached maturity and become deeply integrated into the socio-economic fabric, the situation is reversed. The negative consequences of a mature automotive society are now well-understood and extensively documented; knowledge is high. Yet, this very entrenchment means that the technology and its supporting systems are incredibly difficult to change. Effecting a meaningful shift away from this established paradigm has become prohibitively expensive, logistically complex, and politically challenging, meaning that control is low. Here, the mismatch is inverted: we possess the necessary knowledge of the problems, but we have largely lost the practical ability to implement effective solutions.
This dilemma is acutely relevant to the field of artificial intelligence. In the case of unanticipated risks, we are witnessing the infancy phase for many AI applications. The risk of cognitive deskilling from over-reliance on generative AI serves as a prime example. The initial discourse was dominated by excitement over productivity gains, with little anticipation of potential cognitive harms. However, emerging research, such as the “Your Brain on ChatGPT” study, now suggests that prolonged use may weaken the neural pathways associated with creative and critical thought—a largely unforeseen consequence. Simultaneously, some AI technologies are already demonstrating the challenges of the maturity phase, becoming entrenched technologies whose risks are known but whose control is difficult. Social media recommendation algorithms, for instance, are now widely understood to contribute to social polarization and adolescent mental health crises. This knowledge is high, but because these systems are core components of the business models of some of the world’s most powerful corporations, implementing meaningful regulation or fundamental design changes presents a monumental challenge.
A New Framework: Viewing AI as a Social Experiment
In light of the profound limitations inherent in traditional risk assessment, as exemplified by the Tuxedo Fallacy and the Control Dilemma, a new conceptual framework is required to responsibly manage the introduction of powerful AI systems. The philosopher of technology Ibo van de Poel offers a compelling alternative: viewing the deployment of novel technologies as an inherently experimental process. Van de Poel formally defines a technology as “experimental” when
"...there is only limited operational experience with them, so that social benefits and risks cannot, or at least not straightforwardly, be assessed on basis of experience."
Consequently, because their introduction into the public sphere is characterized by “large uncertainties, unknowns and indeterminacies,” he argues that their deployment must be conceived as a large-scale social experiment.
The distinction between an experimental and a non-experimental technology is not absolute but is instead a blurred, context-dependent line. For instance, building a new bridge using a well-understood and thoroughly tested design does not constitute an experiment. Conversely, introducing a fundamentally new human-computer interface like Google Glass, whose long-term effects on social norms, privacy, and human interaction were largely unknown at the time of its release, unequivocally qualifies as an experimental act.
This experimental framework stands in contrast to methodologies like Value Sensitive Design (VSD), which attempt to proactively mitigate negative consequences by anticipating potential societal impacts during the design phase. While valuable, such anticipatory approaches have inherent limits, as it is impossible to foresee all potential outcomes of a complex technology’s interaction with society. Critics have also noted that this forward-looking approach can inadvertently lead to a focus on “morally thrilling but very unlikely” scenarios, such as speculative risks of existential catastrophe from a future superintelligence. This focus on distant and dramatic threats can distract from the crucial work of addressing the more subtle, immediate, and tangible harms that AI technologies are causing in the present.
The proposed alternative, therefore, shifts the focus from perfect foresight to adaptive learning. It embraces the experimental nature of technology by advocating for its
"...gradual and experimental introduction into society, in such a way that emerging social effects are monitored and are used to improve the technology and its introduction into society."
This paradigm champions an iterative process that values trial and error, incremental decision-making, and flexibility, allowing for real-world feedback to guide a technology’s evolution and integration.
However, this intellectually compelling alternative is not without its own significant challenges.
- The first is a practical problem stemming from its incompatibility with the established “rules of the market.” A gradual, cautious, and closely monitored rollout is fundamentally at odds with the fast-paced, growth-oriented logic of a venture-capital-funded ecosystem that rewards speed and rapid market capture above all else.
- The second challenge is a deep ethical-epistemological tension between the need to learn about a technology’s impacts and the duty to protect the public from harm. Testing in small-scale, controlled environments is ethically preferable as it minimizes potential negative outcomes and allows for easy intervention. Yet, this approach is epistemologically weak, as it is often incapable of revealing the systemic, large-scale risks that only emerge when a technology is widely adopted. Conversely, a large-scale deployment is the only method to gain the epistemological insight needed to understand these emergent properties, but it is ethically fraught, as it exposes entire populations to unknown and potentially significant risks, often without their explicit and fully informed consent.
Applying the Ethics of Experimentation to AI
The central thesis of this analysis culminates in a critical observation: the introduction of many new and powerful AI technologies into society de facto amounts to a real-world social experiment. If this premise is accepted, then it follows that we are ethically obligated to apply the rigorous standards and principles that have been carefully developed over decades to govern the conduct of experiments involving human subjects. The established ethics of human experimentation, born from the tragic lessons of the 20th century, are not an optional set of guidelines but a moral imperative. They provide a robust and essential framework for navigating the profound uncertainties that accompany the deployment of transformative AI.
This established framework is built upon several key pillars:
- Informed Consent: Are users truly aware of the risks they are undertaking when using a new AI? Can they provide meaningful consent to participate in this “experiment”?
- Beneficence and Non-maleficence: The duty to maximize potential benefits while actively minimizing potential harm. This requires careful, ongoing monitoring for negative effects.
- Justice: Ensuring that the risks of the experiment are not borne disproportionately by vulnerable groups, while the benefits flow to the privileged—a common failure mode in technological deployment.
- Independent Oversight: In medicine and research, Institutional Review Boards (IRBs) or Ethics Committees provide independent review. A similar oversight mechanism is desperately needed for the deployment of high-impact AI systems.
- The Right to Withdraw: Participants in an experiment must be able to opt out at any time. How this principle applies to societal-scale technologies is a complex but vital question.
By reframing the deployment of AI through the lens of a social experiment, the discourse shifts from a potentially misleading illusion of control toward a more honest, humble, and ethically responsible paradigm for technological innovation.