From measurement to degeneration

There is no characteristic that is common to everything that we call games… It is a family-likeness term. Think of ball-games alone: some, like tennis, have a complicated system of rules; but there is a game which consists just in throwing the ball as high as one can, or the game which children play of throwing a ball and running after it. Some games are competitive, others not. (Wittgenstein, “Philosophical Investigations”)

The series this post begins, and the concept it introduces—surrogation—emerges after repeated encounters with concepts from various fields, all of which, I felt, were connected by some broader pattern—related, perhaps, by family-likeness. That is, I had the concept of baseball, arcades, Scrabble, and competitive eating—but lacked the concept for “game.” Those related concepts which I stumbled upon, in the course of other research, included, from the field of artificial intelligence, wireheading, underspecification, and nearest unblocked strategy; in philosophy, from C Thi Nguyen, the ideas of gamification and value capture; in statistics, those of overfitting, latent vs. manifest variables, proxy measures, operationalization, and heuristics; in sociology, legibility and Campbell’s law; in economics Goodhart’s law, the Lucas Critique, perverse incentives, and the distinction between private and public information; in information theory, joint entropy and mutual information; in metascience, Tom Griffiths’ idolatry and Feynman’s cargocult; in games studies, degenerate play; and finally, in Theory and everyday parlance, the concepts of fetish, masturbation, cobra effects, cheap play, surface compliance, spirit vs. letter, and winning by technicality.

What connects these ideas? The answer is necessarily long and digressive, will take a thorough walkthrough to answer—after all, inherent in this structure of family resemblance is the lack of any genetic “essence”—each member of the family is related to others, but they do not all share the same green eyes and red hair; there may not be a single family trait which they all share, and even if there were, it would not define them—many non-relatives, after all, would share that trait as well. Still, we can gesture toward the rough strokes of similarity before elaborating details and complications, examining case studies and comparing circumstance. Many of these concepts capture some a quality of once-removal—are premised on an tacit distinction between some “thing itself,” which is under study or being optimized, and some surrogate which, because the thing itself is inaccessible, must stand in its place. Many concepts nod toward the tendency of agents to strategically appear cooperative while in reality contributing little toward (or even undermining) the efforts of the larger project they are embedded in. Appropriately, many of the situations which these concepts describe are themselves game-like, with agents competing for limited resources or preferential treatment.

As one reads, one may notice that to institute a surrogate measure is, in many cases, merely to operationalize, or that some observed consequence of surrogation is analogous to Goodhart’s Law. The purpose of this series is not novelty but synthesis and connection. Novelty, in an era of informational glut, is highly overrated—it is informational management and logistics, most of all, that are needed. It is my belief, and that of many of my friends working in the inexact sciences, that merely providing a roadmap to what is already known, disparately and obscurely, in various private dialectics, can itself be a productive avenue for future advance. In that vein, surrogation is chosen to provide a generic, non-technical term to describe the broader, captured pattern—to, at its most pragmatic level, provide both a noun and a verb (to surrogate) which enable us to better talk about what is, I believe, one of the most quietly influential and inescapable forces of human life. (And perhaps of life, period.) While subcategories, or kinds of surrogation, will be introduced throughout the series, individual distinctions are less a hard and fast taxonomy as they are a scaffold for exploring the various dynamics that emerge in surrogation scenarios.

Surrogate measurements

Recall that in statistics, latent variables—variables of research interest which are hidden, nebulous, underspecified, or inaccessible to direct study—are instead measured indirectly, through a manifest or proxy variable, in a process known as operationalization. We begin by understanding one bare-bones form of surrogation that is most analogous to a proxy variable. This “mere” surrogate measure stands in contrast to a surrogate metric—a distinction which will be clarified near the section’s end.

Australian counterinsurgency expert David Kilcullen writes, in “Measuring Progress in Afghanistan,” of American military efforts to provide a surrogate measure for progress—as well as the ways such efforts, having chosen over-simplified or crude surrogates, result in a poor understanding of the situation on the ground. SIGACTs—military jargon for “significant activities” such as suicide bombings or insurgent attacks—have long been used, Kilcullen tells us, as a surrogate measure for American military progress, with the “assumption that more SIGACTs are bad and fewer SIGACTs are better.” This assumption, on scrutiny, quickly breaks down:

Violence tends to be high in contested areas and low in government-controlled areas. But it is also low in enemy-controlled areas, so that a low level of violence indicates that someone is fully in control of a district but does not tell us who.

Thus, the surrogate measure produces a picture that dramatically misunderstands dynamics on the ground by collapsing important distinctions. The correlation between “American military progress” and on-the-ground violence is all over the place; in some regions and conflicts, it may be a reasonably accurate heuristic; in others it gives exactly the wrong impression.

But we do not yet have the ingredients in place for a surrogate metric, and with it, the emergence of degenerate play. For that, we will need to introduce competing agents who are preferentially treated according to their evaluation by the surrogate measure. If these agents are further able to discern, at least in broad strokes, the basis for their evaluations (and by extension, their preferential treatment), degenerate play will surface sooner rather than later, as agents scrutinize and exploit weaknesses in the surrogate measure. In Goodhart’s words adapted: When a measure becomes a metric—is used as a basis of selection—it becomes a target for strategic play. And when a metric becomes a strategic target, it ceases to be a good measure. This breakdown is caused by the adversarial relationship between metrics and the agents they evaluate.

Surrogate metrics

By “metrics,” to be clear, I do not mean specifically quantitative yardsticks—merely yardsticks, or standards of comparison, in general. That is, a surrogate metric is a surrogate measure which is used to preferentially reward measured agents. But to fully understand surrogate metrics, and the degenerate play it gives rise to, we must establish the adversarial nature of gameplay.

Consider the “spread”—a dominant strategy in competitive scholastic debate. Debate’s rules penalize unaddressed arguments as “dropped” or conceded; as a result, there has been an arms race toward greater and greater verbal speed. Competitors attempt to bring up as many arguments as possible in the limited minutes they are allowed each round; this forces opponents to, with equal speed, address all raised points within their own limited allotment of time (or else effectively cede the round).

What is on display, here, is the adversarial relationship not just between players of a game, but between a game and its players. That is, judges or game designers typically wish to encourage certain styles or strategies of play; this underspecified intent we will call its spirit. (Such a spirit can be argued to inhabit even evolved or decentrally designed—more on this later.) While such a spirit is nebulous and difficult to pin down, its existence is testified to by a shared felt sense, among players and observers alike, of cheap play and winning by technicality—indeed, these felt judgments show high degrees of overlap, controlling for the loyalties and interests of observers. And it is demonstrated in the continual readjustment, by judges and systems administrators, of the literal letter of rules, such that they better reflect spirit and ward off cheap play. These basic dynamics are present in games from Constitutional law to professional sports.

A game’s spirit, in some meaningful way, can be connected to the “point” of play in the first place—the larger, pragmatic purpose that play accomplishes, which can be lofty—simulative education—or base—as in entertainment. These pragmatic functions provide a justification by which judges and administrators alter rules and either prohibit or penalize certain types of play.

We now have the tools to revisit competitive debate. Scholastic policy debate was established and fostered, throughout the 20th century, in a spirit of civics education—training toward some ideal of public and political discourse. Today, due to “spreading”, it is largely unintelligible to uninitiated audiences, who cannot parse debaters’ rapidfire speech, let alone the arc of their arguments (which prioritize quantity over quality, a values hierarchy that inverts our usual standards of persuasion[1]). Somewhere, the spirit of debate—and with it, its founding function—has been lost to degenerate play.

Players in the debate game, first and foremost, were not just measured through surrogates—they were then subjected to the outcome of that measurement; they were evaluated, and then preferentially treated (through wins, losses, and titles) according to the evaluation results.

Next, they were able to gain an awareness of what basis they were being evaluated on; that is, of the surrogates put in place to objectify the evaluation of “quality.” In contemporary society, the rulebook of many games is made public out of a desire for transparency and fairness—the Stele of Hammurabi’s historical import lies, in large part, in its establishment of such norms in the human social game that is “law.” But these benefits come with a trade-off: players engaged in an adversarial relationship against the game itself are given an advantage in degenerating the efficacy of—by optimizing toward—the in-place surrogates. When the surrogates used in evaluation are unclear, one cannot very well optimize toward them—the best available strategy is slow adaptation or evolution toward success, preserving tactics which pan out in wins and abandoning those which do not. But evolution is painstaking where the application of abstract intelligence is rapid: when players can study surrogates, they can deductively reason their way to winning strategies, and optimize for those specific traits which will best please the censors (or “gatekeepers”). Drug smugglers, to give a ready example, are closely acquainted with the technical details and functioning of the systems and tools that screen international shipments at customs. Their packages can then be carefully designed and disguised in order to thwart customs’ detection heuristics—for instance, placing contents in packaging that deflects x-rays. But if a new surrogate were put in place—for instance, specifically searching only those packages that contain x-ray-deflecting material, or using dogs’ sense of smell—then a player strategy previously optimized would become radically unfit, evolutionarily, in the new system—would become a losing strategy.

With an understanding of the surrogate rules—the letter of the system—debaters were able to identify degenerate tactics such as the spread. Crucially, there is nothing especially reprehensible about degenerate strategies; they are the ordinary condition of a self-interested agent within a competitive incentive system, and need not involve such drastic moral tradeoffs.[2] (Our society, having limited resources, status hierarchies, and exclusive mating is inevitably competitive in such a way.) It is not so much that players are “degenerate,” but that their play itself tends to degenerate and undermine the original (or, at some level, desired) spirit and function of the game. Indicators that play may be degenerate are found in complaints, by both observers and other game players, that a certain strategy is “cheap.” Objections frequently include some acknowledgment that the play is “technically” legal—that is aligns with the game’s letter—but is, nonetheless, a kind or cousin of cheating. (Cheating we might define as behavior that violates not just the spirit of a game but its letter; where a letter-abiding judiciary system cannot prosecute spirit violations, it can and does prosecute letter violations.) In this case, the spread is “degenerate” insofar as it goes against the founding civics-oriented spirit of scholastic debate.

We can also revisit Kilcullen’s example, in which the military, attempting to measure some abstract and underspecified “progress,” instituted as surrogates the rate of violence, or SIGACTs, across regions. Recall that this metric obscured, by over-compressing, a situation on the ground in which low-violence areas were just as likely to be enemy-controlled as American-controlled. Now we can introduce a variant of the situation—necessarily simplified, but still illustrating real dynamics—in which, first, the military gives greater attention to high-violence areas (ceding low-violence areas as completed goals), and second, the Afghan resistance, by infiltrating American military intelligence briefings, was aware of the military’s system for evaluating and attending to different regions. Here we have a picture where all the criteria of a surrogate metric, and not just measure, are in place. There is an an evaluating body whose behavior has consequences for evaluated players—that is, the evaluative system results in preferential treatment or asymmetrical outcomes for players on the basis of the evaluation. And there is knowledge, by evaluated players, of the basis for this evaluation and, by extension, their preferential outcomes. At this point, a fairly predictable set of strategic behaviors emerge, with the Afghan fighters attempting to redirect American attention and efforts away from regions the fighters find strategically valuable and toward regions the fighters find strategically irrelevant. And indeed, as Ben Connable shows in Embracing the Fog of War: Assessment and Metrics in Counterinsurgency, Vietnamese insurgents did often refrain from violence in order to avoid US military detection and maintain “freedom of movement.” The surrogate measure is far worse than random noise—in such a situation, it becomes negatively correlated with the real target it hopes to stand in for, having been forcibly uncoupled by strategic agents manipulating the dataset.

Examples of degenerate play in response to surrogate metrics

Flopping in athletics

In modern limited-contact sports, most prominently professional soccer and basketball, a system of officiating is in place with the goal of reducing dangerous contact, and allowing enough physical space between players that the game does not devolve into a tackle sport. (Were tackling not specifically prohibited, we could imagine basketball quickly devolving into an arguably unwatchable, and inarguably dangerous, sport, where any player in possession of the ball was immediately tackled, the ball removed from their hands forcibly.) That is, player safety and audience entertainment are some of the terminal values that inform these games’ spirits of fair play.

Much like the law, athletic officiating is performed by human evaluators—referees—who reconcile their interpretations of game events against their interpretations of a game’s rules. In both domains, player intent, and causal precedence, underly decision-making. As an example of this interest in causal precedence and intentionality, note that when contact between two players occurs, it is—roughly speaking; there are exceptions depending on sport and circumstance—the player who initiates contact who is penalized.

And although there is significantly less delay, and significantly more transparency, between the event and ruling in athletic officiating than there is in our legal system, there is, nonetheless, a similar high degree of unknowability with regard to the issue at hand, be it a homicide or officiated contact between players. Adjudicating officials must inferentially recreate a historical event based on minimal clues. In professional sports, play is rapid, and there are typically just a few referees on the field of play who have been tasked with monitoring the physical movements (and inferring from them the psychological intents) of players.

As a result, surrogate metrics are implemented by referees; most crudely, and founded on the Newtonian principle that every action has a reaction, we see the heuristic that the player who is most physically impacted or displaced, in the fallout of contact, is the “victim” or recipient of that contact, instead of its initiator. As a result, a phenomenon known as flopping has emerged in these sports, with players “acting out” dramatic falls, head snaps, and injured reactions in order to alter the interpretations of referees. Because this behavior is widely understood, by players and audiences alike, to violate the game’s spirit of fairness, the NBA, and many international soccer organizations, have made efforts to combat flopping by penalizing it. But the difficulty of interpreting player intent or discerning the “truth” behind appearances, has undermined these efforts, and the practice remains ubiquitous in many limited-contact sports.

Minority status

Be it in awarding contracts, or doling out business licenses, federal and state governments in the United States have prominently advertised preferential treatment for organizations owned by women or minorities. The legal fact of ownership is surrogated for the meaningful sense of ownership, with predictable results: Many male-run government contractors will legally put their businesses in their wives’ names in order to reduce the disadvantage they face. Similarly, reparative justice efforts to encourage black entrepreneurship in the cannabis industry, by preferentially awarding licenses, has resulted in many “honorary”[3] black owners or co-owners, who are paid some small percentage of profits in order to act as a front for white-owned dispensaries. Whatever initial goal the government may have had, through such programs, has been thwarted by the adversarial, degenerate play of the evaluated agents (business owners). That it has happened so quickly, since the announcement of such programs, is in large part a result of public knowledge of the surrogate metrics, and by extension, of the basis for preferential treatment. We can imagine a situation in which a more “black box” evaluative process would stay robust to degenerate play much longer.

Similar situations exist in affirmative action programs in universities. One common problem is that, in establishing quotas strictly on racial grounds, universities which may have wished to admit disprivileged black American youth have instead admitted vast numbers of highly privileged and wealthy foreigners.

This alone is not an example of surrogate metrics or degenerate play—it is merely a poorly chosen surrogate measure—resulting, no doubt, because the evaluative systems had not fully thought-through the actual intervention they wished to enact. Like Midas, wishing that all he touches turn to gold, the spirit of the request is inadequately translated into letter. (Underspecification as a contributor to surrogation problems will be explored in following sections.)

However, there is the closely linked situation in which white students have claimed minority status through some obscure ancestral line—a great-grandfather who was Navajo, as the public stereotype goes. Were preferential treatment to minority status unknown among student applicants, we can imagine that such a disclosure would be unlikely—the applicant might even be unaware of their heritage—but since it is common knowledge that minority status gives a sizeable advantage in college applications, such disclosures are not unlikely but regular. Again, this adversarial relation between players and game designers—well familiar to Dungeonmasters, lawyers, and parents of young children—is on full display.

Pretty Woman

The most famous scene from the 1990 film Pretty Woman takes place at a high-end clothing store on Rodeo Drive, Los Angeles. The shop’s sales clerk has a system of evaluation which helps her effectively identify clients based on their financial assets and spending potential—and to then selectively cater to these based on this assessment, which functions to maximize her own own commission. (This commission is the larger goal of the system which the evaluation is instruments toward accomplishing.) This clerk cannot possibly know the real spending potential or desire of any customer who enters her shop, but she has limited attention and time, and so she uses surrogates such as their dress and mannerisms in order to make educated guesses and allocate that time accordingly. Inevitably, where these surrogate metrics diverge from reality, she faces (like any other evaluating entity) the possibility of false positives and false negatives: someone who lacks the capital to spend but appears to have it, or someone who despite possessing the capital to spend (and the desire to do so) is not positively identified as such. In the film, Julia Roberts’s character registers as false negative, and she is turned away from the store on the basis of her attire—is, accordingly, not allocated any of the clerk’s time or attention, nor the resources of the shop, such as the ability to try on garments.

This example helps highlights a dynamic present in many, if not all, surrogative behaviors: the evaluating entity has limited resources—at the very least, the resource of time—and, combined with other barriers to knowing the “true” nature of things—full knowledge is always physically impossible—leads this entity to an economic surrogate. But again, this situation on its own is simply a surrogate measure. But, given that there are agents who desire the clerk’s attention, or to try on the shop’s clothing, despite lacking the financial resources to “properly” earn it—and given that class markers are common surrogates in high-end establishments, we might imagine players who knowingly rent or steal an outfit worth of high-end clothing specifically to fool such a shopkeeper.

Many of these examples, perhaps most explicitly that of flopping and Pretty Woman, land us with a considerable problem in carrying out this conceptual project. As soon as we move beyond institutional surrogates and quantified metrics, the behavior displayed is ubiquitous to human life—we are constantly acting as if, or bluffing our way, or dressing up to impress—and, on the flip-side, judging by proxies, inferring wholes from metonymical parts—and this gets us into nebulous, murky conceptual waters, where surrogation and degenerate play seem to underly all human social existence. A world where we live exclusively among surrogates, through surrogates, and for surrogates. The next post in this series, “Surrogates everywhere,” will explore these issues, and attempt to bound and distinguish surrogation from adjacent concepts.


[1] As an illustration of the idea that it is “surrogates all the way down,” consider that persuasiveness is, itself, a surrogate quality standing in for that harder-to-discern quality “correctness.” Much has been made, dating back to Greek Sophism and Roman oratory, of this surrogate, and the flawed pedagogy that results from “teaching to the test”—that is, winning over an audience through rhetoric, rather than for being in possession of a superior stance.

[2] “Hate the player, not the game,” in folk parlance.

[3] Honorary is another interesting non-technical term, in that it distinguishes one type of member (or title, or role) from another type which is seen as more “substantial” or “real.”

6 responses to “From measurement to degeneration”

  1. This is great; thanks!

    Liked by 1 person

  2. […] problem, which we call either “surrogation” (when befalling an evaluating party) or “degenerate play” (when performed by the evaluated). In social games of assessment, the evaluating party looks for […]


  3. […] leading to several more stabs at blog posts, and endless aborted revisions of the original entry. (Blog post 1, blog post 2.) As of early 2021 this project is still well underway, and may last the year. Early […]


  4. […] or signal—for some represented whole, and the subsequent gamification of that representation. Several sections have been posted on this blog already, but I want to give an overview of how they fit into […]


  5. […] with ordering takeout food and pick-up artistry). Here there is room to talk about wireheading and degenerate play, base- versus mesa-optimizers, and the destabilization of base-mesa alignment—in other words, the […]


Leave a Reply to Kenny Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at

%d bloggers like this: