From computation to intelligence: On Agency

Mar 12, 2024

westworld credits who is a secret robot host

Agency: An illusion?

One of the conclusions that people often draw from modern physics is that any sense of agency we attribute to ourselves is ultimately an illusion. After all, we’re made of the same “stuff” that underlies all matter, living or nonliving. Just as people can have beliefs about gods which do not correspond to anything in the real world, our beliefs about our own agency don’t have to point to any real features of the territory, it can just be a subjective perception of ourselves.

An argument for this position is that given unlimited computation, the only information that is needed to predict anything about the universe are its initial conditions and the fundamental laws of physics. We don’t need to posit distinct causal entities as they don’t add any predictive power: Chemistry, biology or fluid dynamics are just coarse grained descriptions of the world, they’re what fundamental physics looks like at a particular scale, in a particular regime. Specifically, our descriptions of ourselves don’t add any predictive power, from the perspective of Laplace's demon, standing outside of the universe, with access to unbounded computation.

The only flaw with this line of reasoning is that by definition, nothing inside the universe can have access to unbounded computation, or stand outside of the universe. The fact that our minds cannot contain a faithful model of the world which contains us is not just a feature of us, it is a feature of every observer embedded within our world. A view from nowhere is a view that is inaccessible to anything that exists somewhere, and there’s no reason why we should treat that view as the only arbiter of what information is objectively useful.

The second law

Consider the second law of thermodynamics, it asserts that our uncertainties of a closed system tend to increase over time, roughly speaking. It seems difficult to reconcile this claim with the fact that the laws of physics conserves information: If every possible state deterministically and injectively transitions into another possible state, any probability mass we assign on each possible initial state will deterministically transfer to a distinct subsequent state, which means that our uncertainties about a system should remain conserved over time.

The essential remedy to this paradox is that the second law is fundamentally a law about spatially and computationally bounded observers like us. To observe a subsystem is to come into causal contact with it, so that your beliefs about that subsystem become entangled with the actual state of the subsystem. In principle, for any observations, we can derive a probability distribution over the actual states of the system by computing their likelihoods of generating our observation, and evolve that distribution forward in time. However, any observer needs to run that computation within the universe, and this computation cannot outrun the actual time evolution of the universe itself, because it is part of it. The result is that since it is infeasible for us to “jump ahead” and look at what the universe will generate, the behavior of the systems we observe will look more and more random “to us”.

The second law highlights an important interplay between objectivity and subjectivity: It is “subjective” in the sense that it governs the beliefs that an observer can have about another system, but it is universal as it applies to all observers embedded within this world, with very loose constraints on how we choose to define observers. I argue that the concept of agency should be viewed with the same lens: Although it may be useless to an omniscient, computationally unbounded predictor standing outside of the universe, there can still be a universal tendency for objects inside our universe to represent the concept of agency. We can observe the detailed evolution of a cellular automaton by simulating it on a computer, but what does the cellular automaton look like, from the inside?

Free Stock Photo 3558-light cone | freeimageslive

Interlude: Local information

Information is fundamentally about resolving uncertainties. Consider the set of all possible worlds, with our world contained within it. Each bit of information we gain about our world equates to narrowing our search space in the set of possible worlds by half. The more bits of information we gain, the narrower the set of remaining possibilities.

A complication is that there is a free parameter of how we should cut down our search space with each bit, or alternatively, how we should carve reality at its joints. Although different observers will make different choices for this free parameter, a common constraint they share is that the information they receive must be “local”, they need to be able to derive each bit just by looking at a bounded region in space. This is because the laws of physics are local: any region in space is only directly affected by, and only directly affects its immediate neighborhood, the same must be true for any spatially bounded observer. If we divide the set of possible worlds into two sets, but we can’t look at any bounded spatial region of a world and tell which set it belongs to, then neither can any observer embedded within that world.

The constraint of locality suggests another convergent property of how efficient observers gain information about the world that contains them: They should prioritize information that is redundantly represented across space and time. In other words, if many observers can derive the same information by observing from different regions in spacetime, then that information tells you something about the state of the world far away from you, it is information that is relevant at a distance, and for the future.

Towards a measure of agency

An agent, if anything, is a thing that exists somewhere inside a universe. To say an agent exists at a particular time is equivalent to narrowing down our search space to the set of worlds which contains that agent at that time. The description complexity of an agent is the number of bits required to pinpoint that search space, this description need not specify the agent to atomic precision: It can equivocate across details which don’t define the agent itself, similar to how the same algorithm can be implemented across different hardwares.

Intuitively, agency is be defined by what it does: When we believe that an agent “wants” something, we predict that it will overcome obstacles to achieve it; we have some degree of certainty over the endpoints in the future despite radical uncertainty over which pathway it will take to get there. Knowing the existence of the agent grants us observable information about the future, it allows us to predict what will happen from what it wants. If we subtract the description complexity of an agent from the bits of information we gain about the future by knowing its existence, we can quantify how powerful an agent is relative to how complex it is.

This measure of agency rules out architectures like the giant lookup table which maps each scenario it might encounter to an action that steers the environment towards a particular outcome. Although its existence does imply observable information about the future, the construction of that table requires us to specify more information than what is needed to describe its observable effects, and it doesn’t tell us anything about how goal-directed behavior can be realized.

A bias towards simplicity favors agents that don’t already have information of its environment stored within itself, it favors agents who can learn, who can adapt to new situations, who make use of new information to generate novel strategies of achieving its goals.

From the perspective of any physical system that aims to make predictions about the future, it needs to recognize that some bits of information are more leveraged than others: Knowing the velocities of a billion gas particles grants you almost no information about what you will observe the next second; on the other hand, our dreams and wishes are encoded with much less complexity, yet they steer our actions and generate cascading effects that will remain observable long after we die. If an alien civilization wants to predict the fate of our planet, they will not focus on our natural habitat, they will be focusing on us, because the wishes in our minds are the preceding shadows of the future.

Open questions

The dangers of agency

The more powerful an agent is, the more information we can derive about the future from knowing its existence. However, the futures that we want is a very narrow set compared to the space of all possible outcomes, and the powerful AIs we will design might not end up aiming for that narrow set unless we do enough to ensure that it does so. How can we chisel information about the optimization targets in our minds into digital intelligences that will eventually become more powerful than us?

The goal slot

Can we design a general-purpose goal achieving engine? In other words, can we design an agent with a “goal slot”, such that we can control what set of futures the agent will steer towards by controlling the bits of information we put into that goal slot? If this is possible, it means that agency has a simple ‘core’ that is independent from the goals themselves in the same way that an engine can be disentangled from the steering wheel which controls the direction.

Agentic behavior and agentic architecture

When something displays agentic behavior externally, what does this imply about its internal architecture? Will it have a model of the world that contains beliefs about the environment external to itself? How will this world model be structured?

Conversely, when we look at the internals of an agent, how can we derive what set of futures it will steer towards? What is the type signature of an optimization target? How do they live in our minds?

Appendix: Agency and locality

Our conception of agency depends on the assumption that both the information that specifies the agent and its effect on the future must be local. Since the laws of physics conserves information, for Laplace's demon it must be the case that any information about the present implies the same number of bits of information about each point in the future, this means that no specification of an agent can produce more or less info into the future.

The constraint of locality rules out effects or information that cannot be observed by anyone. Consider the position and velocities of a billion gas particles: Although that information will technically be conserved over time, it will rapidly disperse into space such that very quickly no observer will be able to recover that information in the future. These kinds of effects won’t matter to us, nor will it matter to anything else in this world.

The reflective fixed point