A few rough, in-progress ideas about action selection…
A Model of Action Selection
For extremely simple organisms, there can be simple stimulus-response mechanisms whereby given situations in the environment are always responded to with a given behavior. Once there are multiple incompatible behaviors, however, there must be a mechanism to adaptively determine which actions to execute in a given situation. This is the problem of action selection, and this page attempts to describe one possible model of the way action selection might work.
One fundamental question when developing a model of action selection is what gets selected? What are the components of action, and at what level is selection made? In other words, how is action structured, and how could an action selection mechanism specify an action? The next section explores this question, but it basically argues that fundamental components of action can be combined in various ways to form higher-level units of action. These action units (AU’s) are what the action selection mechanism operates on.
The paper then goes on to describe a four-step process of action selection, with a section devoted to each of the four processes. The first step in the process is the emergence of potential action units, which are candidates for execution. In most situations, there are nearly limitless possible actions, and thus organisms cannot consider all the possibilities. Thus, there are processes by which a smaller number of alternatives are selected for further scrutiny later in the process.
The second step is a triage step, where the parameters that will govern later processing are determined. In the third step, each of the alternative action units is evaluated by multiple processes which assign both positive and negative value to the action unit. The interactions between the contributors to this evaluation are likely complex. This evaluation step can take different forms in different contexts. For example, the evaluations could be fast but rough evaluations, or slower and more fine-tuned. Or, they could involve different evaluators, or weight the evaluators differently. Thus, the preceding triage step determines what type of evaluation is appropriate.
The fourth step collapses the positive and negative values for each potential AU into a single value, activation level. All other things being equal, the competing AU with the highest activation level will be executed, if it is above some threshold. However, there are likely to be various operating rules which place constraints on when new AU’s can be executed, to prevent “thrashing” or “dithering” (where two similarly activated AU’s rapidly alternate), to allow smooth transitions between actions, and so forth.
Following the discussion of the steps of action selection, there is a section on how multiple actions can be combined, and how currently executing actions may figure into the selection of new actions. After that, there is a discussion of “mental effort” and how this model might explain how mental effort can change action selection outcomes, without recourse to a homunculus. Finally, there is a discussion of how all of this might relate to empirical questions, and how the model might be tested.
What Gets Selected? The Structure of Action
How is “action” organized? We know of course that action potentials cause ACh to be released at neuromuscular junctions, leading to contraction of a set of muscle fibers. This is one of the fundamental “units” that actions are made of. But many or most actions involve coordinated contraction of many muscle fiber sets together, sequenced over a period of time, and sometimes guided by sensory information. In order to understand how actions are selected, we need to get a better understanding of the way actions themselves are structured, so that we know what is getting selected.
The concept of grouping muscle contractions together, and of sequencing them in time, are likely fundamental to the structuring of action. It is likely that such groupings over different muscle fiber sets over time can be bundled together and executed or thought about as a whole. This allows an organism to perform complex movements without having to specify each component of the movements separately.
Another factor to take into consideration is the role of input (from sensory stimuli or from other areas of the brain) in guiding sequences of actions. Some action sequences are executed without feedback (so-called “open loop” actions), but many actions sequences do involve feedback or sensory guidance. For example, picking up an object likely involves more than just a preprogrammed set of muscle contractions – this is guided by visual feedback, as well as tactile and muscle feedback telling the organism when the object is grasped tightly enough. Just as open loop action sequences can likely be packaged together and executed together, it is also likely that action sequences involving sensory guidance can also be packaged together and executed, without having to specify the exact sensory-motor relationship at each step of the way.
The level of sensory guidance can become much greater than simple visual feedback of movements and similar things, and there can be complex contingency rules built in to action units. For example, there could be a rule that says if you see an arrow pointing left, push a button on the left, or if it’s right push right, etc. Complex pattern matching and categorization can also play a role, as in a soldier standing guard with the directive “if you see an enemy, shoot him.” In the extreme, an action may be specified as “do whatever it takes until condition X is satisfied in our model of the world,” which seems equivalent to pursuing a goal without any clear idea beforehand of how the goal is to be achieved. In this case, any “state of the world” that we can conceptualize becomes a possible component of an action set. I would also argue that feedback for the action can come not just from external sensory information, but also from internal states of the body and brain. So actions can be guided by internal states as well.
We are capable of stringing together sequences of actions, and bundling with those sequences various levels of guidance from internal and external cues – to form larger units of action. In many cases, these units of action are executed in their entirety at one time. But as organisms have gotten more complex, they (we) have been able to conceptualize units of action spanning longer and longer periods of time – from seconds and minutes, up to hours, days, and decades. In such cases, the units of action are not actively executed continuously, but may involve periods of execution, followed by time where nothing is being done and where other actions are taking place, followed by more execution, and so forth.
This requires some mechanism beyond just a bundling of actions with sensory guidance that can be immediately executed. I imagine this involves some sort of “intentions” or “plans.” So some action bundles can be executed all at once, whereas others involve some immediate actions and also the setting of an intention or a plan, and still others involve only the setting of a plan. The setting of an intention or making of a plan probably involves modifying structures in the brain that keep track of these longer-term actions, and this likely involves mechanisms to “remind” us about ongoing plans when it is time to actively execute parts of them again, etc., among other things.
This setting of an intention is just as much an outcome of an action selection process as are more obvious muscle contraction behaviors. This is probably part of a larger realm of mental resources that can be selectively directed as a result of action selection processes – so there can be “mental” actions or brain actions as well as muscle contractions, which likely go through similar and coordinated processes of action selection. As a side note, “having a goal” can be thought of as having set an intention for an abstract action with a large amount of contingency from internal and external circumstances. It is also possible to have a goal be “make sure that circumstance X is unlikely to happen.”
The action selection mechanisms only select actions, both muscular and “mental”, which are to be executed immediately. This selection may involve kicking off sequences of actions which may run for some time, and it may also involve executing mental actions which set intentions for future actions. The point is, the action selection mechanisms do not guarantee future actions, they only execute current mental and muscular actions which make it more likely that certain future actions may occur. In many ways this is adaptive, since circumstances change, and thus the overall action plan of an organism may need to change. This fits in with our everyday experience of setting an intention, only to have the intended action not selected when the time came for it.
In summary, this section was intended to describe the structure of different actions and units of action, which are what get selected in action selection. There are many different types of action units – from small groupings of related muscle contractions, to complex sequences of actions depending on sensory feedback, to sequences involving mental actions, to conceptually long-term actions involving the setting of intentions. Even when someone is pursuing a “goal,” it is not just the goal itself that is selected, but an intention to act guided by input including making the target “state of affairs” come to pass. In other words, action selection doesn’t select a goal per se, but it does select pursuing a goal, among other things.
The Emergence of Potential Action Units
This section describes the process by which potential action units come to be “in the running” to be executed. I use the word “emergence” because it may be that there is no one central place in the brain for generating possible actions, but rather they can “emerge” from many different areas.
The body has various models or representations of itself and of the world – representations of the body, representations of the world (including current sensory information), representations of the “mind”, etc. There are pattern detectors constantly looking at the various models, and looking for “positive” and “negative” patterns in the models, as well as potential positive and negative things. The terms “positive” and “negative” don’t mean any special “essence”, but are just a categorization made, which then serves certain functional roles.
When either type of value or potential value is detected, this can trigger the emergence of a potential action unit. This potential action unit can be very specific (shift weight to the other foot), or more generic (generally dedicate cognitive resources to figuring out the cause of a problem and fixing it in an appropriate way). Additionally, the potential action unit is given some initial values, often based on the values calculated by the model pattern detectors.
It is also possible for a pattern detector to propose some potential action unit without this being based on model value or potential value – but in such cases initial value must be assigned to the potential AU before further processing. This whole process is subject to factors such as attention, and value intensity, as not all possible potential AU’s are brought up for consideration.
It may also be at this stage that there is some basic “possibility” testing, to see if potential actions are even possible, given the current context and current state of the body.
Triage / Determining Level and Type of Evaluation
Not every action selection choice must go through evaluation by the full resources of the brain – this would be slow, inefficient, and distracting. Rather, some decisions require little evaluation and others require extensive evaluation. For example, postural changes might require little evaluation. It is likely that a set of heuristics performs a rough analysis of a proposed action, and determines what degree of evaluation to use.
The evaluation process can be modified by many factors – emotional state, type of the proposed action, rough estimate of risk level, etc. Additionally, an action may be quashed at an early stage (here), depending on the ongoing state of the brain and body. For example, the person might be engaged in a high-priority activity that leaves no resources for sufficient evaluation, and so many potential actions might be inhibited unless they have very high priority.
Evaluation of Proposed Actions
Each proposed action and its alternatives are subject to a evaluation process. This involves evaluation of the “state of the world” if the action is attempted (or not attempted). This evaluation has two basic sides – a positive side (do the action), and a negative side (don’t do the action).
Depending on the level and type of evaluation, many different brain processes can weigh in on the decision. Stored evaluations in memory, current state of the body, context, “simulations” of the potential outcomes, short-term outcomes, long-term outcomes, complex cognitive models, etc. can all contribute.
The evaluation process determines how these different sources of value are combined, weighted, etc. Other resource are involved in moderating different sources of value – such as calculating the estimated probability of success of the action.
How different sources are combined can be modified based on emotional state, context, or many other conditions in the brain – so the evaluation process itself can be varied depending on many factors.
How is value determined? My guess is that first, there are basic, inborn “pattern detectors” which associate very basic, high-level neural patterns with value (either positive or negative). These could detect incoming signals from pain lines, look for higher-level patterns in social relationships, etc. In some sense, these value pattern detectors do not depend on experience – they may change with development, or they could potentially change with experience, but generally they do not change with experience. However, through learning, more and more complex and specific things get associated with these basic patterns, and so these pattern detectors can be invoked by many things in the environment.
When it comes to valuing AU’s, there may be several mechanisms by which these inborn pattern detectors influence value in a given decision context. One way is by imagining the potential consequences of the AU, and then running this simulation through the pattern detectors to see how they would value it. Another way is by storing associations between environmental circumstances (or AU’s themselves) and values.
It is also likely that there are many other “pattern detectors” that are constantly detecting other patterns, and making changes throughout the body. One example would be homeostatic systems that monitor levels of needed body resources such as blood glucose and sodium concentration. It is likely that a pattern detector could determine that the body needed water, and kick off various changes in the body. Among these changes could be modifying value evaluations of drinking, increasing the value of getting water.
What are historically called “emotions” also may work in a similar manner. A pattern is detected, and various changes are kicked off, often including a changing of valuation of various environmental situations and actions. For example, in anger, a certain pattern is detected (such as injury combined with standard violation, or whatever it might turn out to be). Then, changes are kicked off – including of course autonomic changes such as increased heart rate, but also changing the value of situations and actions. For example, hitting someone suddenly takes on a very positive value.
There are also multiple sources for evaluating actions. There may be lower-level pattern detectors as discussed, which operate largely outside of consciousness (only their results – changes in valuation – are present in consciousness, but their processes are not). We are also, as humans, constantly building vast and complicated models of the world, which are also used in evaluating things and actions. Our cognitive systems are capable of doing many complicated manipulations, and much complicated reasoning, and can reason about many things which have nothing to do with making action decisions – solving math problems, for example.
However, when our cognitive systems are used in the service of making decisions, they are always used in the service of “value.” In other words, the idea of a “rational decision” independent of value is incoherent. All value is inherently arational. Anyway, our more complex higher-level cognition is capable of making complex representations of the world – including representations of our own personal values. Our cognitive systems thus can represent value, and then look at what actions seem likely to bring about what it believes to be the maximal value in a given situation. More broadly, this higher-level evaluation of what is “best” for us may differ from the value suggested by lower-level pattern detectors. And, different pattern detectors may disagree with each other – one may value food more, another water more, another sex more.
There are also likely pattern detectors for risk, probability of success, etc., which also can be computed in a different way by higher-level cognitive systems.
The point is that there are likely many different contributors to determining value. There should be some mechanism for putting all of these different influences together. There are likely complicated rules for weighting different sources of information differently in different circumstances.
Collapse to Activation, and Execution
The positive and negative evaluations are then combined into a singe value, activation. There are also likely mechanisms in place to ensure that transitions from one action to another, or the addition of an action, are done smoothly. Subject to these mechanisms, the competing action with the highest activation is then executed, as long as it is above a certain threshold. The threshold may vary, depending on many factors.
Multiple Concurrent Actions
The question then comes up – but can’t we do more than one thing at a time? If action selection always chooses the highest activated AU, how is it possible that we can do several things at once?
One possible answer to this is that AU’s can include multiple actions. This intuitively seems right some of the time, for example playing football may involve running and throwing the ball at the same time. However, in many cases this just doesn’t seem intuitively right. For example, when walking down the street, it may occur to someone that perhaps they should make a phone call. It seems inefficient to do all of the calculations based on whether you should “walk AND talk on the phone”, calculating the benefits and drawbacks of walking. Rather, it seems more that the walking is a background part of your current state, that must be taken into consideration, but which isn’t itself significantly evaluated when you’re considering talking on the phone.
This may be related to the idea of automatized or “unconscious” actions, as opposed to actions which require “attention.” The body is in a certain “background” state, including things such as body position, but also including automatized actions, which can be initiated and then require little attentional cognitive resources. The idea is that then action selection takes place against this background.
What about automatized / unconscious actions themselves – do they go through the same process of action selection? My hunch is that yes they do – the mechanisms of action selection I imagine are not conscious, and operate for both automatized and more conscious actions. The results of this process may be presented to consciousness – or not. Thus, a similar process of action selection can go on outside of awareness for automatized actions, although there may be less inputs available for value calculation since less resources are available.
Mental Effort, and Banishing the Homunculus
All of this has been a mechanistic description of how action selection may work. But what of our phenomenal experience of “agency”, the feeling that there is a “you” who decides, and who is capable of exerting effort to influence or control the choices made. How does our model explain this?
According to the model presented here, and to my assumptions, phenomenal experience is relationships between parts of the brain. Thus, our feelings are not immaterial essences, but rather physical categorizations made within the brain. Feelings always means that a categorization is being made. In the case of the feeling of mental effort, it may be that this occurs when the brain is using cognitive resources that either move the brain away from its normal state, or which are expensive – in other words, it means that limited resources are being used (much like in the case of feelings of “physical” effort).
As discussed above, the direction and employment of many cognitive resources are also under the control of the “voluntary” system, and there are “mental” actions just as there are muscle contraction actions. It seems likely that cognitive resources can be directed towards biasing the action selection mechanism in one direction or another. In such cases, the direction of mental resources is accompanied by a phenomenal feeling of agency, as is common for actions carried out by the voluntary system. Since these resources are limited, their use is also accompanied by a phenomenal feeling of effort. Thus, in some cases limited mental resources are directed towards influencing the outcome of the action selection process, and this is accompanied by feelings of agency and effort.
The trick here, though, is that this direction of mental resources towards biasing the outcome of the action selection mechanism, is itself the result of the action selection mechanism. In other words, the action selection mechanism can evaluate and select a potential action unit, and that action unit can be “try to influence a future decision of the action selection mechanism.” Thus, there is no homunculus, but rather an action selection mechanism which determines that the most valuable thing to do in a given moment is to bias its own operation in a future moment. Of course, it does this mechanically, without any special “understanding” that this is what it is doing.
Appendix – Pattern Detection / Categorization in the Brain
In a sense, every neuron is a pattern detector, detecting patterns in its input and firing when those conditions are met. I suspect that this concept applies more broadly to the brain – several neurons together can detect more complex patterns, and so forth.
“Emotions” may involve pattern detection. Pattern detectors detect various situations in the current environment, state of the brain, state of the body, etc. When a given pattern is detected, a range of changes are carried out – such as changes in autonomic functioning, triage processes, evaluation processes, and evaluation weighting. They may also result in phenomenal experience of an “emotion” in consciousness.
We are consciously aware of “emotions” in many cases due to their phenomenal component and through our awareness of heart rate changes, sweating, etc. It is likely that there are also many other pattern detectors that are constantly making changes to the way our brains process things, but without any conscious awareness of these changes.
Appendix – Evolutionary Learning vs. Somatic-Time Learning
Also, as a side note, from an evolutionary perspective, there is likely to be need for both what I’ll call “evolutionary learning”, and what I’ll call “somatic-time learning” (after Gerald Edelman’s discussion of somatic-time selection in “Neural Darwinism”). Evolutionary learning refers to behavioral tendencies which have been adaptive in the past, and which do not require any (or much) environmental learning to occur. For example, it might be good to very easily develop a fear of snakes without ever getting bitten by one. This is very useful, since some things you don’t have the luxury of learning, since they may kill you. Thus, it makes sense for us to have behavioral tendencies that are “built-in” and not the result of learning.
On the other hand, evolution cannot prepare us to take specific advantage of the particular environment we find ourselves in, since every organism’s environment is different in ways that can’t be predicted by evolution. Thus, “somatic-time” learning is extremely important. Over the history of evolution, organisms have developed more and more capacity for somatic-time learning, and thus more flexibility. Human capabilities in this area are tremendous. However, there is still a need for evolutionary learning, as, in addition to things that will kill us, if we are too flexible, we may deviate too easily from behaviors which truly promote gene propagation. Thus, it is not surprising that there are at times multiple sources of value which can conflict.
Appendix – Voluntary and involuntary systems
There is some level of distinctness between the so-called “voluntary” system, and other systems in the brain and body. The voluntary system is characterized, not by some mysterious “volition” or “agency”, but by its extreme flexibility. The voluntary system controls actions for which it is adaptive to be able to customize to the current environment, complex models of the environment, and predictions about the future.
In contrast to the voluntary system, there are many “involuntary” systems in the brain and body. These systems are characterized, in addition to operating to some degree independently of the voluntary system, by involving actions which it is not especially beneficial to have tremendous flexibility. Involuntary systems are not by any means static, but the parameters under which they carry out different actions are relatively fixed. For example, the autonomic nervous system is engaged differently depending on activity level, threat level, etc., and is largely fixed in this relationship to its parameters. There has not been an adaptive need to be able to selectively engage the ANS depending on complex models of the environment – this relatively fixed relationship works pretty well. The immune system is a similar case – it involves a vast complexity, but most of this complexity depends on situations within the body, with no need to perform differently based on complex environmental models.
There are two main types of resources which it is adaptive to be able to shape significantly by the environment and models of the environment – movements, and the direction and use of cognitive resources. Thus, these may be the two main components of the voluntary system. [Might they be separate but coordinated systems?]
This is not to say that the voluntary and involuntary systems operate independently – almost everything in the body is coordinated to some level. But in many ways they do operate largely autonomously. However, there are clearly relationships between them. For example, the autonomic nervous system can make changes depending on what the voluntary system is about to do. If the voluntary system has decided that it is going to begin running, the ANS may make changes before the action even starts. Conversely, the ANS, and the effects of the ANS on the body, provide input to the voluntary system to take into consideration when making its decisions.
This is the case because this is how the action selection mechanism makes decisions – without positive and negative value, there is no way to interact with the mechanism, other than fixed rules. There are other, “involuntary” systems which do not operate on “value” and interact with the voluntary system in fairly rigid ways.

Follow on Twitter
Visit on Facebook
Subscribe to RSS Feed