The “thing in itself” (which is precisely what the pure truth, apart from any of its consequences, would be) is likewise something quite incomprehensible to the creator of language and something not in the least worth striving for. This creator only designates the relations of things to men, and for expressing these relations he lays hold of the boldest metaphors… It is this way with all of us concerning language; we believe that we know something about the things themselves when we speak of trees, colors, snow, and flowers; and yet we possess nothing but metaphors for things — metaphors which correspond in no way to the original entities.
—Nietzsche, “On Truth and Lies in a Nonmoral Sense”
My initial plan, for this post, was to discuss the “far outfields” of this concept, or family of concepts, surrogation. I was going to identify what was domestic and what was foreign territory, and to draw a very fuzzy, gradient boundary between surrogation and these foreign lands. But this analogy would mischaracterize the situation—the issue, really, is that one patch of the family is nested inside one larger conceptual set, and the next patch a subset of a different wrapping concept, and so on. The concept of diaspora is more illustrative—not so much in its implication of shared genealogy and dispersion, but of related subpopulations or subcultures, scattered and embedded.
The first super-set of note—a super-set insofar as it furnishes necessary but insufficient qualities of its surrogation subset—has been an immortal subject of philosophy. It is our once-removal from reality, gestured at in the quoted Nietzsche but stretching back to Plato’s cave. Our signs are stand-ins for the aspects they pick out; we treat them, in cognitive shorthand, as if they were reality—reify our concepts as objects, are surprised when words break down on us. Our words are referents not just once- but twice-removed from the world, a surrogate for our organized perceptions, themselves representational of the origins of senses.
The second boundary was pointed out to me by my friend and colleague Crispy Chicken, in his piece “Wireheading as Teleological Misnomer.” In a similar vein of cognitive shorthand—or proxying and inference—we can fall into error by uncritically assuming that a system’s name, or intention, or public description—its origins and representations—are interchangeable with its the system’s function. (Opticratics.) “Names trick you into bottoming out your level of inquiry.” Just because I program an algorithm named “doubleInput” does not mean that my algorithm will double the input. No, the system’s functionality is neither what it describes itself, nor the intention of its designer (though that intent is its genesis), nor how it is socially perceived (though its functionality is the genesis for perception). “The problem is that names are generally teleological: a can opener is meant to open cans.” Individuals who view the world with an “object ontology” lens, more than a “functional” lens, often struggle to find functional substitutes for a missing ingredient, material, or tool. Duct tape is duct tape; butter is butter: the pragmatic properties, having been erased by their nominal representative, cannot be found their functional equivalents; the only operation left to the reifier is an identity check. There are, of course, specific cybernetic ways that perceptions, intents, and names act directly on a system to align its real functionality to their image. When a system is intelligent enough to pick up on name, intent, designer, and adapt itself to them, there may be a gravitational pull. (The American Indians no doubt believed in, and saw lived out, a certain degree of nominative determinism.) The concept of hyperstition maps this space, as does the saying “fake it until you make it”: in a world of social proxies and deferrals of judgment, appearances make themselves felt. But it remains the dominant frame—that is, the majority thesis to which hyperstition, or nominative determinism, stand as notable contradictions—that these genealogical and representational cousins of the thing itself (intent, description, name, perception) are not, finally, equivalent or interchangeable with the thing itself, with the objective functioning of the system—and ought not be reified as if they were.
(And yet we see these mistakes constantly. We confuse good intent with good outcome—or use intent as proxy-surrogate for outcome. We confuse the self-representation of an organization with a neutral description of its real operation.)
But, while difference and negation may in some cases beat positive description, I believe describing not-surrogation is less productive than describing the crude matrix that actually makes up the surrogation family concept.
A theory of game and spirit
Let’s consider a classic example of perverse incentives: a city pays a bounty on dead rats (incentive structure) to improve sanitation and health outcomes (purpose). At first, the rat population drops, as citizens are encouraged to clean up the streets. Then, a few enterprising individuals begin breeding rats in their basements, costing the city enormous amounts of money in bounties while having a negligible or even negative impact on sanitation and health.
We will call this scenario a “game.” We can note a few things about this game by fleshing out its incentive structure (aka reward function). The bounty is the game’s reward. The rules by which the reward is dispensed we’ll consider its letter. This letter is an attempt by the designer to implement their intended spirit: the holistic, often vague goal the designer wishes to accomplish, and the holistic, often vague style of play the designer anticipates accomplishing this goal. Finally, there is the game’s metric: the method for monitoring a player’s behaviors, and determining an interpretation of reality which can be measured against the letter (that is, measured by the letter) to selectively dispense the reward. Such a system requires an evaluating agent or evaluating mechanism; in reality, its mechanism or agent will vary widely in its purview (programmed, legal, bureaucratic, or otherwise) to interpret the player’s accomplishments against the game’s spirit-conveyed-in-letter.
In the abstract, such an incentive structure structure can fail, or “come apart,” in a number of ways. First, the style of play the designer wishes to incentivize may not, in fact, accomplish his desired end-goal. Second, the formalization of the designer’s spirit into letter may fail to adequately represent his spirit in all its holistic underspecification; styles of play which he wished not to encourage and reward may dominate the game as it objectively exists in its actual rules, as opposed to as it subjectively exists in the intended spirit of its designer.
We might imagine referring to the enterprising breeders as “defectors” or “cheaters,” even if, technically speaking, they are abiding by the rules. Degenerate play is the term typically used in gaming studies—degenerate because, as a kind of “hack” standing in violation of the game’s spirit (even while being technically in compliance), it undermines the larger purpose and function of the game itself. It is play that degenerates a game’s telos. This dynamic illustrates the adversarial relationship between a wrapping “game,” itself designed by self-optimizing agents, and the players who are wrapped inside this game, themselves self-optimizing within the letter of rules outlined by the game designer. (Social, subjective judgment has the advantage of being able to detect a game’s spirit: human beings are very able to identify, and broadly agree on, many cases of behavior that is against a game’s spirit even as it complies with the game’s literal rules. Formalized and objectivized decision-making modes are not as context- and intent-sensitive, because they are not psychological—but they have the advantage of minimizing individual bias.)
A structure of surrogates
We can also note that the incentive structure of our game is a structure of surrogates. First, the letter stands surrogate for spirit. We have learned from Midas to specify edge-cases when asking prankster gods to make our dreams come true. Even a young child, in proposing a game of “three wishes from the genii” to a friend, will proactively specify “no wishing for more wishes, no wishing for infinite powers,” etc. Second, the measurement or metric used to dole out rewards—upon comparison with the letter of dispensation or punishment, via the reactive ritual—is a surrogate. Even if the fully specified letter of the government policy, in hoping to control rat populations, successfully with “rats caught while running loose, which were not raised by oneself or one’s accomplices…,” etc etc, there is still the problem of monitoring and observation. It would not be possible, in most cases, to sufficiently surveil a population in order to ensure that citizens were, in fact, playing by the fully specified game-spirit. So some easily observable surrogate, which somehow correlates or corresponds (logically, statistically, metonymically, etc) has been erected as the real (as opposed to idealistic) basis for doling out rewards. Here, that surrogate is the possession of a dead rat. The failure of the surrogate to stand robust to degenerative play, that is, to be “gamed” by players, is both a failure of surrogate specification (letter standing place for spirit) and surrogate metrics (observable or “manifest” variables standing place for hidden or “latent” variables). If you are a critical reader, you will have noted that some degree of surrogate metric—observables standing in for non-observables, and being extrapolated in an attempt to create a full portrait of the entity in its non-observable entirety—is present in all human interaction, which gives it its “opticratic” character; appearing is in many cases becomes as good as (functionally equivalent to) “actually” being. Opticratics, a cousin of our banal sensory and linguistic once-removals, is perhaps another super-set.
As we will see, the other crucial part of our picture is the presence of “levels” and nesting (or “embedding”). The government is in someway “above” and “wrapping” individual citizens by creating a game of which those citizens are players. The government evaluates and preferentially treats or selects its citizens on the basis of its evaluations; this structure inevitably warps citizen behavior toward competing for a positive eval. At the same time, the organized, enveloped individual citizens are themselves evaluating possible futures, possible actions and their (forecast) consequences. (We are picky in the garden of forking paths.) We’ll use the concepts of Markov blankets and mesa-optimizers in order to better understand this dynamic.
In their 2019 paper, Hubringer et al introduce the concept of mesa-optimization: a “framework that distinguishes what a system is optimized to do (its ‘purpose’), from what it optimizes for (its ‘goal’), if it optimizes for anything at all.” Mesa-optimizers are selected for by “base optimizers,” and “inner alignment” refers to an alignment between the base and mesa optimizer—for instance, natural selection is a base optimizer selecting for reproduction; organisms are subject to the base optimizations of natural selection even as they themselves may have goals which only partially align with the base optimizer’s goals. Modern non-reproductive sex is an example of a technologically-enabled uncoupling between reward from the perspective of the basis system—natural selection—and the perspective of the mesa-optimizer—a human being.
The authors are careful to stress that not all optimized systems optimize (i.e., are “mesa”). A bottlecap is optimized to selectively contain and release liquids from a bottle, but it is not an optimizer. It has been optimized by human beings (much like, say, our food has, be it through recipe improvements or plant and animal breeding). A system is an optimizer only “if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system.” This is a formalized version of our player within a game, and we will focus on optimizeds-that-are-also-optimizers—on “mesa-optimizers”—because we are interesting first and foremost in human organization, and the category “human being” describes one level of biotic organization that mesa-optimizes.
We can also now introduce the authors’ concepts of a base objective and mesa objective. The base objective is the “criterion the base optimizer was using to select between different possible systems”; the mesa-objective is “whatever criterion the mesa-optimizer is using to select between different possible outputs.”
Crucially, while the authors discuss systems which are two- or three-level, to be a meso-optimizer is to stand in relation to another level. It is not an objective and inherent property of an optimizing system, but the situation of “being embedded” within another optimizer. This brings us to nesting and hierarchy.
Let us take seriously some form, conforming however closely as is necessary for the case at hand, of Karl Friston’s theory of Markov blankets. This theory holds, among other things, that boundaries are a precondition of life itself (and of complexity more generally). They are a prerequisite for maintaining homeostasis, that is, to control and regulate internal conditions which are, again, necessary to fulfilling its goals. In other words, boundaries are, first and foremost, a selection mechanism, with both a schema for admission and physical capacities for enforcing this preferential schema. They allow valuable resources—that is, goal-furthering materials, such as water, in body cells, or food supplies in a castle—to stay inside the boundaries, and assist the bounded entity in its goals. They keep undesired or harmful materials outside, either by preventing entry or expelling them. This includes other agents or sub-agents, who will attempt to improve his own lot by gaining access to the internally-hoarded resources of another bounded agent—either antagonistically, through theft or violence or deception, or cooperatively, in symbiosis. Alignment is the central principle which separates “good” from “bad,” to a meso-optimizing Markov blanket: the property of furthering or thwarting the blanket’s goals. Our target domain, in understanding surrogation, is the alignment of meso-optimizers to their base optimizers, from both the perspective of the meso-optimizer and the perspective of the selecting base.
Life shows a “propensity,” Friston et al write, “to form multi-level and multi-scale structures of structures”: hierarchy is nesting or “wrapping” layers; each layer, I argue, is coordinated by one or more “games” of preferential treatment, complete with an incentive structure such as was described earlier in this post. Each layer stands as a base optimizer to the level below it, which is a meso-optimizer from the perspective of the layer above it. Each wrapping layer attempts to align the goals of the blanket below it, setting the game rules of participation and preferential treatment by which complex coordination is achieved.
For instance—and this will ignore worthwhile nuances for the purpose of concision and pattern-emphasis—a company selects employees it believes to be aligned with its goals. Those employees have their own priorities, values, and goals (the meso-layer to the company’s base). But each employee is a Markov blanket in his own, composed of many cells; these too are selectively killed, expelled, or directed to the bloodstream depending on a similar appraisal of goal-alignment. (Perceived-as-symbiotic bacteria remain; perceived-as-adversarial bacteria are hunted by the immune system.) Above the company is a government, whose aims stand surrogate for the good of the nation; this government writes policy which selects for and encourages business practices that are aligned with national interests, while penalizing practices that are in misalignment. These governments are competing in a natural selection-style base layer that is geopolitics. In all cases, we can readily furnish many examples by which the selected meso-optimizers’ interests actively diverge from the base-optimizing layer’s interests, despite appearing, at first or externally, to align. We can call this deceptive alignment, and note that—just as individuals are incentivized to feign cooperation while free-riding, meso-optimizers are incentivized to feign alignment if it is in their interests—if they can gain resources, or prevent persecution, from the wrapping base. Preferential treatment for alignment will, unless alignment can be directly and personally tested, always be surrogate for the appearance of alignment, which is the real basis of reward (and thus, the actually incentivized behavior).
Of course, employees also select companies just as companies select employees. And romantic alignment (dating as as extended period of gathering evidence about a prospective partner’s goals and their synergy with one’s own), while varying by cultural and historical contingency in the extent to which males and females act as “gatekeepers” versus “applicants.” This alignment is necessary if one will let another entity past or into one’s own boundary, one’s own blanket—not just exposing one’s underbelly but one’s tender interior. So we keep door policies, a drawbridge and portcullis.
 And if we are playing the genii game ourselves, and decide to wish for immortality, we may be careful to specify a conditional immortality—to avoid eternities of suffering, of being trapped for long periods in an iron casket in the sea. This is akin to Midas requesting voluntary transformation.
 Recall the difference, in our closed system of reference: metrics are measurements that are used as the basis for the selection of mesa-optimizing agents; that is, players of the game who have agendas and goals of their own, separate from, but partially and genetically aligned with, the goals of the system which implements the metrics.
 We also can note that the system’s surrogate incentives give way to “surrogate” gameplay, better referred to as degenerate. A subset of degenerate gameplay is wireheading.
 For instance, the company does not choose hires, or selectively promote—it is other mesa-optimizing employees of the company which do so.
Leave a Reply