ABSTRACT. Value alignment is a property of an intelligent agent indicating that it can only pursue goals and activities that are beneficial to humans. Traditional approaches to value alignment use imitation learning or preference learning to infer the values of humans by observing their behavior. We introduce a complementary technique in which a value-aligned prior is learned from naturally occurring stories which encode societal norms. Training data is sourced from the children’s educational comic strip, Goofus & Gallant. In this work, we train multiple machine learning models to classify natural language descriptions of situations found in the comic strip as normative or non-normative by identifying if they align with behaviors of the main characters. We also report the models’ performance when transferring to two unrelated tasks with little to no additional training on the new task.
ABSTRACT. In future agent societies, we might see AI systems engaging in selfish, calculated behavior, furthering their owners’ interests instead of socially desirable outcomes. How can we promote morally sound behaviour in such settings, in order to obtain more desirable outcomes? A solution from moral philosophy is the concept of a \emph{social contract}, a set of rules that people would voluntarily commit to in order to obtain better outcomes than those brought by anarchy. We adapt this concept to a game-theoretic setting, to systematically modify the payoffs of a non-cooperative game, so that agents will rationally pursue socially desirable outcomes.
We show that for any game, a suitable social contract can be designed to produce an optimal outcome in terms of social welfare. We then investigate the limitations of applying this approach to alternative moral objectives, and establish that, for any alternative moral objective that is significantly different from social welfare, there are games for which no such social contract will be feasible that produces non-negligible social benefit compared to collective selfish behaviour.
ABSTRACT. On a variety of complex decision-making tasks, from doctors prescribing treatment to judges setting bail, machine learning algorithms have been shown to outperform expert human judgments. One complication, however, is that it is often difficult to anticipate the effects of algorithmic policies prior to deployment, as one generally cannot use historical data to directly observe what would have happened had the actions recommended by the algorithm been taken. A common strategy is to model potential outcomes for alternative decisions assuming that there are no unmeasured confounders (i.e., to assume ignorability). But if this ignorability assumption is violated, the predicted and actual effects of an algorithmic policy can diverge sharply. In this paper we present a flexible Bayesian approach to gauge the sensitivity of predicted policy outcomes to unmeasured confounders. In particular, and in contrast to past work, our modeling framework easily enables confounders to vary with the observed covariates. We demonstrate the efficacy of our method on a large dataset of judicial actions, in which one must decide whether defendants awaiting trial should be required to pay bail or can be released without payment.
ABSTRACT. Although essential to revealing biased performance, well intentioned attempts at algorithmic auditing can have effects that may harm the very populations these measures are meant to protect. This concern is even more salient while auditing biometric systems such as facial recognition, where the data is sensitive and the technology is often used in ethically questionable manners. We demonstrate a set of five ethical concerns in the particular case of auditing commercial facial processing technology, highlighting additional design considerations and ethical tensions the auditor needs to be aware of so as not exacerbate or complement the harms propagated by the audited system. We go further to provide tangible illustrations of these concerns, and conclude by reflecting on what these concerns mean for the role of the algorithmic audit and the fundamental product limitations they reveal.
ABSTRACT. Inspired by recent breakthroughs in predictive modeling, practitioners in both industry and government have turned to machine learning with hopes of operationalizing these predictions to drive decisions. Unfortunately, many social desiderata concerning consequential decisions, such as justice or fairness, have no natural formulation within a purely predictive framework. In the hopes of mitigating these problems, researchers have proposed a variety of metrics for quantifying various notions of deviation from the parities we might observe in a perfect world and offered a variety of algorithms that attempt to satisfy subsets of these parities or to trade off the degree to which they are satisfied against utility. In this paper, we connect this approach to the literature on ideal and non-ideal methodological approaches in political philosophy. The ideal approach consists of positing a perfect world, assessing deviations between our world and the perfect world, and taking actions to minimize these discrepancies wherever we observe them. However, by failing to account for the mechanisms by which our non-ideal world arose, the responsibilities of various decision-makers, and the impacts of their actions, ideal thinking can often lead to misguided policies. In this paper we demonstrate a connection between the recent literature on fair machine learning and the ideal approach in political philosophy, and show that some recently uncovered shortcomings in proposed algorithms reflect broader troubles faced by the ideal approach. We work this analysis through for both statistical and causal formulations of fairness and suggest several directions for new research.