The study of Artificial Intelligence (AI) ethics requires a conceptual shift away from viewing AI as a discrete, self-contained technology. Instead, it is more accurately understood as a core component within complex and dynamic socio-technical systems. This perspective posits that the significant ethical challenges emerging from AI—including algorithmic bias, fairness, accountability, and unforeseen societal impacts—cannot be adequately addressed by examining algorithms or code in isolation.
A comprehensive ethical analysis must therefore consider the intricate and reciprocal relationships between the technology itself, the human actors who design, deploy, and interact with it, the institutional frameworks that regulate it, and the broader societal values it both reflects and reshapes.
The Collingridge Dilemma
A persistent obstacle in the governance of any emerging technology is captured by the Collingridge Dilemma, articulated by David Collingridge in 1980. This dilemma exposes a fundamental paradox that complicates attempts to proactively manage technological development for the social good. It consists of two interconnected problems: an information problem and a power problem.
In the nascent stages of a technology’s life cycle, when its design is still fluid and its trajectory can be easily altered, there is a profound lack of information regarding its potential long-term social consequences. It is at this early stage that embedding ethical values or regulatory guardrails would be most effective, yet we lack the foresight to know precisely which interventions are necessary. Conversely, once the technology reaches maturity and is deeply integrated into economic and social structures, its negative consequences become empirically evident. However, at this point, the technology has become so entrenched that effecting meaningful change is often prohibitively expensive, disruptive, and politically difficult.
This dilemma highlights that technological development is an inherently unpredictable process, whose ultimate impact is not pre-determined but emerges from its sustained interaction with society. Consequently, navigating between an overly restrictive precautionary principle and a permissive, uncritical technological enthusiasm remains a central challenge for policymakers and society at large.
A Systems-Based Approach to Technology and Society
To transcend the limitations imposed by the Collingridge Dilemma, it is essential to adopt a socio-technical systems perspective. This framework challenges the deterministic view of technology as an external force acting upon society. Instead, it asserts that technology and society are mutually constitutive and co-evolve. As the philosopher of technology Deborah Johnson has argued,
“Computer experts aren’t just building and manipulating hardware, software, and code, they are building systems that help to achieve important social functions, systems that constitute social arrangements, relationships, institutions, and values.”
Example
The world civil aviation system serves as a canonical example of a socio-technical system. Its functionality and remarkable safety record are not attributable solely to the sophistication of its airplanes. Rather, they are an emergent property of a complex assemblage that includes physical objects (aircraft, airports, radar), human actors (pilots, engineers, air traffic controllers), and a dense web of organizations, institutions, and rules (airlines, regulatory bodies like the FAA, international treaties).
The system’s integrity depends on the seamless interaction and interdependence of all these elements. This perspective reveals that failures can arise not just from technical malfunction but also from human error, organizational dysfunction, or regulatory gaps.
The Unique Characteristics of AI Socio-Technical Systems
When this analytical lens is applied to AI, it becomes clear that AI systems are a special class of socio-technical system, distinguished by novel components that introduce unique ethical complexities. As outlined by van de Poel (2020), these systems incorporate artificial agents and technical norms. Unlike traditional tools, artificial agents possess degrees of:
- autonomy (the capacity to operate without direct human control),
 - interactivity (the ability to engage with their environment and other agents), and
 - adaptability (the ability to learn and alter their behavior over time).
 
Furthermore, these systems are governed by technical norms—the operational rules and optimization functions embedded within their code. These norms operate on a causal-physical basis, such as an algorithm designed to maximize user engagement or minimize prediction error.
This is fundamentally different from social norms or legal statutes, which are grounded in human intentionality, reason, and collective agreement.
The potential for conflict between an AI’s embedded technical norms and overarching social values is a primary source of ethical tension. This socio-technical framework is invaluable because it not only clarifies how AI systems function but also illuminates the mechanisms through which they produce social effects and ethical dilemmas, thereby pointing toward more holistic and effective governance strategies.
Inherent Challenges in Aligning AI with Human Values
The unique properties of AI systems give rise to several profound challenges for value-aligned design.
- Incompleteness: AI systems often exhibit incomplete design, meaning their ultimate behavior is not fully specified or predictable at the outset. Their capabilities can emerge dynamically through learning processes, or components may be repurposed from other systems, making a comprehensive ex-ante ethical review difficult. This leads directly to the problem of endemic unintended consequences, which are far more prevalent and unpredictable in adaptive AI systems than in static, conventional technologies.
 - General-Purpose Systems: Moreover, the trend toward creating general-purpose systems, such as large language models, introduces another layer of risk. These models are designed for broad applicability and can be deployed in countless contexts, many of which were not envisioned by their original developers. This versatility makes it impossible to anticipate and mitigate all potential misuses or negative societal impacts.
 - Obligation to Redesign: AI systems also carry a unique obligation to redesign when they produce harmful outcomes. This obligation is particularly pressing when the systems generate disvalues—for instance, when a recommender system’s technical norm of maximizing engagement leads to ideological polarization through filter bubbles. The autonomy and adaptability of these systems complicate the redesign process, as altering their core logic is a complex task.
 
The Problem of Fairness in Machine Learning (ML)
An examination of fairness within the domain of machine learning (ML) offers a compelling case study that illuminates the shortcomings of a purely technical methodology and underscores the critical need for a socio-technical framework. The conventional approach within the ML community has been predominantly focused on creating technical “solutions” to mitigate discriminatory outcomes. This effort is largely channeled into the development of fairness-aware learning algorithms, which intervene at various stages of the ML pipeline. These interventions can be categorized into three main phases:
| Stage | Action | Description | 
|---|---|---|
| Pre-processing | Cleaning the Data | Biased data is a primary source of unfair outcomes. This stage involves modifying the training data to remove or mitigate existing biases before the model is trained. | 
| In-processing | Constraining the Algorithm | This involves modifying the learning algorithm itself, adding constraints to ensure that it balances accuracy with fairness metrics during the training process. | 
| Post-processing | Adjusting Predictions | After the model has made its predictions, this stage involves tuning the outputs to achieve a more equitable result across different demographic groups. | 
A Socio-Technical Critique: The Five Abstraction Traps
A more critical perspective, articulated by Selbst et al. in their work “Fairness and Abstraction in Sociotechnical Systems,” posits that treating fairness as a purely technical problem constitutes a fundamental abstraction error. This error leads to several conceptual “traps” that emerge when designers and developers fail to account for the broader socio-technical system in which their algorithms are deployed. These traps highlight the inherent limitations of viewing fairness through a narrow, technical lens.
The Framing Trap
“Failure to model the entire system over which a social criterion, such as fairness, will be enforced”
This trap is the failure to model the entire system over which a social criterion like fairness will be enforced. Technical work often exists within an “algorithmic frame,” where the goal is simply to optimize the relationship between given inputs (data representations) and outputs (labels). Fair-ML research expands this to a “data frame,” which interrogates the inputs and outputs themselves for bias. However, this is still insufficient. A true “sociotechnical frame” recognizes that the model is just one part of a larger system that includes human decision-makers and institutions.
Example
For example, a risk assessment tool in criminal justice might have a fairness guarantee based on its output scores. But if it fails to model how judges actually use those scores—sometimes following the recommendation, sometimes ignoring it, or deviating in biased ways—the system’s real-world fairness guarantee becomes invalid. The human component must be included within the system’s boundary.
The Portability Trap
“Failure to understand how repurposing algorithmic solutions designed for one social context may be misleading, inaccurate, or otherwise do harm when applied to a different context”
This is the failure to recognize that an algorithmic solution designed for one social context can be harmful when applied to another. Computer science culture values portable, reusable code. This task-centric abstraction allows the same classification algorithm to be applied to predicting loan default, employee performance, or criminal recidivism, regardless of the vastly different social contexts. However, what constitutes a “fair” outcome is deeply dependent on local values and concerns.
The fairness needs of one court jurisdiction may be completely different from another, let alone from the needs of a corporate hiring department. Designing for fairness requires resisting the programmer’s instinct for universal portability and instead tailoring solutions to specific, non-transferable social situations.
The Formalism Trap
“Failure to account for the full meaning of social concepts such as fairness, which can be procedural, contextual, and contestable, and cannot be resolved through mathematical formalism”
This trap is the belief that complex social concepts like fairness can be fully and adequately captured through mathematical formalisms. Because algorithms require mathematical instructions, the fair-ML community has focused on creating quantitative definitions of fairness. However, this leads to two problems.
- First, many mathematical definitions of fairness are mutually exclusive, and math alone cannot tell us which to choose in a given context.
 - Second, and more fundamentally, social concepts of fairness are not static outcomes but are inherently procedural, contextual, and contestable.
- Procedurality: In law, fairness is often about process, not just outcome. Firing someone for an illegal reason (e.g., race) is different from firing the same person for a legal one, even if the outcome is identical.
 - Contextuality: What we consider wrongful discrimination depends entirely on cultural context. Legal and social standards for fairness differ depending on the domain (e.g., housing vs. employment) and the attributes in question (e.g., race vs. disability).
 - Contestability: Fairness is a politically contested concept that changes over time through legislation, court cases, and shifting social norms. Fixing one definition into code undermines the democratic process of debating and redefining these norms.
 
 
The Ripple Effect Trap
“Failure to understand how the insertion of technology into an existing social system changes the behaviors and embedded values of the pre existing system”
This trap is the failure to account for how introducing a technology changes the social system itself. A technical system is not inserted into a static environment; people and institutions react and adapt to it, leading to unintended consequences.
For instance, introducing a risk assessment tool changes a judge's role and the power dynamics in the courtroom. Judges may defer to the tool due to "automation bias," or they may resist its influence.
Furthermore, technology can alter the embedded values of a system. A risk assessment tool designed to predict “dangerousness” may unconsciously privilege incapacitation over other goals of the justice system, such as rehabilitation, deterrence, or restoration.
The Solutionism Trap
“Failure to recognize the possibility that the best solution to a problem may not involve technology”
This trap, also known as “technological solutionism,” is the failure to recognize that the best solution to a social problem may not involve technology at all. Because the field is rooted in computer science, it starts with the assumption that a technical intervention is needed. This mindset prevents a crucial first step: evaluating whether technology should be built in the first place.
In situations where fairness concepts are politically contested or the social system is too complex to model accurately, building an algorithm is as likely to make things worse as it is to make them better. The most prudent action might be to study the existing human system and consider non-technical policy changes, like presumptively releasing defendants charged with non-violent crimes instead of trying to perfect a risk assessment tool.
Digital Solutionism
The Solutionism Trap is a manifestation of a broader cultural trend identified by Evgeny Morozov as “Digital Solutionism.” This ideology is characterized by the belief that technology, given the right code and algorithms, can solve all of humanity’s complex problems, including those that are political and social in nature. It prioritizes the immediate application of a technical answer before the underlying problem has been fully understood and the right questions have been asked. This approach is often driven by a desire to eradicate imperfection and make every aspect of human life more “efficient,” frequently without regard for the values that may be lost in the process.
The socio-technical approach to AI ethics necessitates a fundamental shift in perspective. It compels us to move away from the pursuit of simple, technical “fixes” for complex social problems like unfairness. Instead, it provides a more robust and comprehensive framework for analysis and governance. This framework does not promise definitive solutions but offers guidance for navigating the fundamental tensions, uncertainties, and conflicts that are inherent in the development and deployment of AI. Acknowledging that AI systems are deeply embedded in our social world is the first and most critical step toward designing and regulating them in a more responsible, ethical, and equitable manner.