Predictive Processing & Art as Cognitive Remodeling

Visual art — representational imagery — begins somewhere between fifty and one-hundred thousand years ago, overlapping with the Upper Paleolithic Transition. The period consists of rapid gains in tool technologies alongside the beginnings of modern symbolic thought, with human societies developing currency systems, dispersed social organizations, and increasingly sophisticated religious belief.

To Alva Nöe, writing in 2015’s Strange Tools: Art and Human Nature, the religious and social practices which began in this era, such as funerary rites, rites of passage, religious ceremony, symbolic adornment, and complex linguistic interactions, are examples of organized activities, “evolving patterns of organization” within which humans are embedded. Modern examples include driving a car in a highway system or navigating the complexities of workplace protocol — both being situations where we act improvisationally within sets of constraints and using established scripts pre-determined by the situation’s structural context.These structures are primarily inherited, demanding a reckoning: “we are organized,” Nöe writes, “but we are not the authors of our organization.” The arts, then, serve as a second-level, reflexive practice which allows us to model and refine the structures of our organization, helping us get our bearings and thus lending us agency in the complex, self-normalizing systems which “enfold and threaten to dissolve” us.[1]

Nöe pegs perception itself as an organized activity, structured by inherited models. Taking cues from John Dewey’s Art as Experience, it is a “dynamic exchange with the world,” not just an eye-brain system but an “eye-brain-head-body-ground-environment” system. The way we see, what we see, and how we see it is (famously) altered by cultural norms. Seeing and judging — the two are helplessly intertwined in the continuous process of making sense of the world — are here first-level practices which art interrogates. Nöe goes so far as to argue that the act of picture-making brought about modern human consciousness by interrogating and reorganizing the previously unconscious, background process of seeing. This, in part, is Nöe’s explanation for the simultaneous set of developmental milestones that emerge in the Creative Explosion twenty-five to forty thousand years ago, bringing with it the first recognizably human societies.[2]

Nöe is a romantic when it comes to art, and tends to be dismissive is of neuroscientific (or “neuroaesthetic”) explanations for human artistic behaviors, arguing that attempts to explain art through neuroscience are as ludicrous as attempts to explain the philosophical project through neuroscience.[3] It’s odd, then, that Nöe’s primary thesis — that perception is an organized activity which art deliberately remodels — is supported by the neuroscience.

II.

“Predictive processing” is a theory of cognition sometimes referred to as the discipline’s premier unified theory; in the past two decades, it’s received equal parts cautious optimism and outright skepticism from researchers in the field. Intriguingly, it’s a model which also closely describes our artistic vocabulary and values: why we care about authenticity, suspense, ambiguity, patterns, compression, foreground-background relationships, resonance, maps/chords, novelty, the logical consistency of fantasy worlds, and more. It’s not just the closest we’ve come to a unified theory of cognition — it’s also the closest we’ve come to a unified theory of art.

Predictive processing makes the claim that our brains are evolutionarily designed, not unlike a neural network, to constantly make predictions about reality which are then “checked” or verified. The accuracy of the prediction is incorporated into future predictive work, so that over many iterations, something like our identification of a class of object (berry) is perfected and can be contextualized by other attributes (brightly coloredinedible) in order to predict risk factor. Predictive processing allows us to make highly sophisticated probability analyses about what is possible or likely to be the case, and therefore what is possible or likely to happen next in a sequence, so that the subject can take appropriate measures and interact with his environment in a sophisticated, self-preserving way. Visual (and more generally, sensory) perception is a cornerstone for how predictive processing works, though the concept gets frequently extended to conceptual, higher-level learning. [4]

Van de Cruys and Wagemans, in “Putting reward in art: A tentative prediction error account of visual art,” explain predictive processing’s theorized role in perception:

The predictive coding approach of perception holds that the brain actively anticipates upcoming sensory input rather than passively registers it. On the basis of prior experience, the brain actively makes predictions about what visual input to expect in the current context of stimulation. At every level of the visual hierarchy predictions are generated and propagated (top-down) to lower levels, where they are checked against incoming (bottom-up) evidence.

A significant part of predictive processing involves attentional focus, or the context-appropriate allocation of limited cognitive resources. Because the brain lacks the ability to actively monitor a complex scene in its visual field (e.g. the complex and only loosely connected movements of a crowd), it allocates attention through predictive processing. This involves not just making predictions but making predictions about what should be predicted on. If one is living in a dangerous environment (walking through a savanna in Paleolithic times), these predictions can be life-or-death, providing a gradualist evolutionary account of how such a prediction system would emerge and develop. But even outside such situations, attentional focus is helpful for finding a friend in the street, or picking out a book at a bookstore. It is an essential part of all reality navigation, and prevents attentional paralysis.

Nöe’s extensive arguments about background, foreground, and defamiliarization are supported by inferential theories of cognition in which the brain builds models from bottom-up data, then applies these models (or patterns) top-down to new perceptual events, e.g. category-based recognition of familiar objects to shape our reality. Our perception is altered at a significant cognitive level by our expectations and paradigms for reality — which art can update and subvert.

The concept of a ‘model’ does a lot of explanatory work in predictive coding. By ‘models’, cognitive scientists mean mental representations that organise information and allow the brain to extract signals from noise. A classic example is the way in which we hear speech or music. The signal that reaches the ears is usually fuzzy and incomplete; a sound engineer looking at a computer display of the auditory data hitting our eardrums would see a mess that could take months of signal processing to decode. However, our brain can use its prior knowledge to produce coherent representations of words, sentences and tunes. We can hear our friends across a crowded room because we’re capable of filtering and cleaning up the signal – because we have a lexicon of explanations ready to anticipate the streams of data with which we are confronted. What we ultimately experience, then, is the model that we’ve learned is the best fit for the information to hand, that best predicts and accounts for our perceptions before they happen.

And, especially radically, predictive processing carries the implication that our 

perception [is] little more than a kind of controlled hallucination. We do not experience the external world directly, but via our mind’s best guess as to what is going on out there.

III.

Events which contradict our predictions help update our models towards accuracy. Because predictive miscalibration and failure in the real world has negative ramifications, any mismatch between a high-confidence prediction and reality (e.g. expecting an object to behave one way and it behaving the opposite, e.g. expecting a table to hold steady and instead it collapsing in on itself) can produce high levels of stress anxiety, an unpleasant rather than pleasant sensation. But as Van de Cruys and Wageman note of George Mandler (Emotion, 2003), any “conflicts between expectations and actual circumstances create arousal because they signal important changes in the environment that must be acted upon. Depending on the cognitive context and the situation, the arousal is subsequently evaluated in positive or negative.” Here, part of the “context” or “situation” is both the general set of expectations set by interacting with a work, as well as possibly an understanding that no action need be taken in response to “important changes in the environment” of the work. It seems that on some level, even as much as one “mistakenly” views an artwork as a valuable map of reality to “train on,” the artwork is still cordoned off as a minimally consequential training stimulus, a zone of low-risk and therefore low-anxiety/high-reward learning. In an artwork, the viewer is immersed in events and predictive training within a controlled, low-risk “sandbox,” causing predictive updating to be pleasurable rather than stressful.

We appear to treat artistic works simultaneously as communications — where accuracy in judging interlocutor intent is valuable — and as maps or models of reality, which is perhaps why we also use words like “authenticity” and “truthiness” in talking about art, or why a work which doesn’t “ring true” is written off. Predictive processing is a way of explaining our attraction to both types of artistic ambiguity, that is, those which play off our understanding of artworks as communications and those which play off our understanding of artworks as reality-training inputs. Insofar as we (often unconsciously) see works as reality models, or eligible inputs in the brain’s predictive training set for reality testing, we look for models which upend, qualify, or otherwise add nuance to any in the system’s current set of all top-down predictions. If a work or a part of a work does not “ring true” or “resonate,” it is dismissed by the predictive processing system so as not to contaminate learning. To use neural network terms: Upon recognizing some kind of fundamental difference between the underlying logic or grammar of the work and the underlying logic or grammar of previously interpreted reality, the brain excludes the stimulus from its training set; the input-space is filtered and constrained by network assessments. While resonance is a necessary heuristic, it is a poor one. Things are resonant not in that they seem true based on the sum of one’s real previous experiences (such a thing does not exist in an objective form), but in relation to our interpretations of triaged previous experiences. Because resonance involves calculations based on interpretations rather than some inaccessible “reality,” bias is frequently reaffirmed, and valuable radical updates are dismissed as implausible while minor updates in the wrong direction can “ring true.”

What this amounts to is that interpretation, sparing cheap jabs at Sontag,[5] is an essential part of all artistic experiences. The process of interpretation is literally unavoidable, a prerequisite of both any legible visual structure (and therefore of perception) or any linguistic grammar. Our more conscious and deliberative interpretive work (what is typically referred to as “interpretation” in literary theory) is merely a way of dealing with less automatic thematic, ethical, or narrative grammars and structures. Ambiguity — whether it complicates conscious or unconscious interpretive processes — is one of the most rewarding forms of informational stimulus, exploiting our predictive processing system by slipping under the cover of resonance and posing as valuable newness.

A quick sketch of how a few concepts from aesthetics align with the predictive-processing model:

  • Authenticity: a heuristic for speaker credibility, a way of ensuring that the information we’re given about the world by an artwork is accurate.
  • Novelty: the introduction of new types of maps in order to minimize redundancy and maximize predictive updating.
  • Logical consistency of fantasy worlds: A consistent grammatical structure between ifs (priors or premises) and thens (consequences). An underlying logic is a prerequisite of being able to predict consequences given premises (a more penetrating form of pattern-recognition). One type of logic is “ordinary reality” or “physics,” but fantasy worlds have organizing logics as well which, famously, have infuriated fans when violated.
  • Patterns: organized sequences which contain a central logic of repetitions which can be “broken” by variations. The pattern teaches what will come next given a prior condition; a variation shows an alternate pattern or logical “if-then” relationship that could exist given the prior conditions.
  • Surprise: Essentially the “variation” in the pattern, where an artwork demonstrates how a set of conditions can “add up” differently than our existing models led us to predict.

[1] Nöe makes this argument through exclusion: art practices which are not interrogative are not, according to Nöe, art. A pop song isn’t musical art; it’s actually a first-level human practice (or “organized activity”) called “song-making.” Choreography may be art, but “dancing” is an organized activity.

There are, of course, feedback loops between first and second-level human practices. Choreography responds to dance culture which responds to choreography. The emergence of written texts influenced speech practices influenced written texts, the increasing availability of an “image of language” resulting in a “dense, historical, many-layered scriptural-linguistic structure.”

[2] The idea that art is engaged largely or primarily in the interrogation and reorganization of the visual field (or that art in general interrogates and reorganizes the senses) certainly has an abundance of precedents. The Russian Formalist idea of defamiliarization, pioneered by thinkers like Viktor Schklovsky and describing the creative program of making the familiar strange, anticipates it. His observations on perception can be found in writings on the art-historic shift from investigating the seen (Premodern) to seer and self (Renaissance Humanism) to the very action of seeing (Impressionism and Modernism). It is the Western shift from interrogating man and world to visual mediation and representation, “I” to “eye.”

[3] While science may not solve issues like meaning or morality, it’s unclear why a biological explanations for why humans are interested in philosophy to begin with, or why humans prefer certain types of proofs or explanations (e.g. elegance), doesn’t constitute a valuable contribution to the philosophic project.

[4] These perceptual tendencies were well-documented before predictive processing models came into vogue. At least as far back as the mid-20th century, psychologists have been using the term “perceptual set” to describe a top-down interpretive bias which affects how bottom-up data is processed and understood. See Allport (1955) especially, but also Bruner & Minturn (1955), Sandford (1936), Gilchrist & Nesberg (1952), and Kunst-Wilson & Zajonc (1980).

[5] By “interpretation,” Sontag means something more specific and conscious than I use it here. Sontag’s essay might, in fact, be more accurately titled “Against Allegorization,” where objects and phenomena in an artwork are assumed to “stand for” or symbolize something greater. If this essay rebuts any approach to art, it looks more like this one (Christopher Higgs).