Sixteen years ago, Allen Newell argued for a theory of how the individual parts of the mind are combined. He pioneered the SOAR architecture, and is considered one of the fathers of modern cognitive science. Using many of the guidelines that Newell proposed, The Adaptive Control of Thought theory (which has undergone several version changes since the seventies) has become the most sophisticated theory of human cognition. It recently underwent a name change to become The Adaptive Control of Thought — Rational (ACT-R) theory. Developed primarily by John Anderson (in concert with a number of other researchers over the years), ACT was described by Newell as “the first unified theory of cognition.” A recent version of ACT-R was the subject of discussion in “An Integrated Theory of the Mind” (2004), and it is this paper that I will be discussing.
The intent of “An Integrated Theory of the Mind” is to present an integrated model of the mind that can be used for real-life situations and study, as opposed to more isolated studies of language or sensory processing. Primarily, it is a hypothesis about how individual modules are integrated with each other, and how they interact to produce complex behaviour and cognition. The second major aim of the paper is to develop a model with the capability to mine the huge swathes of data emerging from brain imaging studies, especially those that concern themselves with more than one region of the brain. Because of the complexity of higher-level cognition, such analysis would be very difficult without such a model.
In pursuit of these intentions, the authors have developed ACT-R 5.0, which is a model composed of a number of distinct modules: the perceptual-motor module (decomposed even further into visual and motor modules) for communication with the outside world, the goal/intentional module for keeping track of goals and intentions, and the declarative module for declarative memory. Each of these modules has an associated buffer, which is its only method of communication with the other modules. The perceptual-motor, goal and declarative modules are each connected to a central production system through these buffers, and it controls any request or retrieval of information from them. Although there are plans to build in the ability for the modules to communicate with each other, for now the only communication in ACT-R is mediated by this production system.
The crux of cognition is the selection and execution of production rules, which are “if…then” rules defining how to operate given the current situation and information at hand. The production system works on information in the buffers, and only this information is what we are aware of at any given moment. We are not aware, for example, of everything we have ever stored in our declarative memory. We are only aware of the chunk that we have currently retrieved and which is located in the declarative buffer.
The first step in developing a model to explain the data from brain imaging studies is to assign locations to each of the modules and buffers involved. Although the location of the intentional module has not yet been identified, the goal buffer is associated with the dorsolateral prefrontal cortex (DLPFC). The temporal lobe and hippocampus are given as the regions possessed by the declarative module, and the retrieval buffer is associated with the ventrolateral prefrontal cortex (VLPFC). The visual module is located in the occipital lobe and elsewhere; the visual buffer in the parietal lobe. Finally, the manual module is located in the motor cortex and the cerebellum. The manual buffer is also located in the motor cortex. The production system is located in the striatum, pallidum and thalamus, all parts of the basal ganglia.
The production system pulls information from the buffers via the striatum, which projects to the buffers. The striatum performs a pattern-recognition function, recognizing the contents of the buffers. The pallidum then signals the thalamus as to the best course of action using an inhibitory conflict-resolution mechanism. Projections from the striatum inhibit the pallidum neurons corresponding to the selected action, and because projections from the pallidum to the thalamus are also inhibitory, the thalamus neurons that would perform that chosen action are the only ones that aren’t inhibited by the pallidum. Hence, the chosen production is carried out by the thalamus. In each cycle (estimated to take approximately 50-ms), patterns in the buffers are recognized, a production fires, and the buffers are updated. At any given time or during any given cycle, only one chunk is allowed in each buffer, and only one production can be selected to fire.
The nature of the perceptual-motor module is that it is custom-fitted to the types of testing the model undergoes. In order to account for timings involving visual input and motor output, ACT-R uses a model based on the EPIC system (Meyer & Kieras, 1997, as cited in Anderson et al., 2004). Because the only relevant manual output for testing situations is through use of a keyboard and mouse, the manual system is given a more focused responsibility of controlling only hand movement. Because eye tracking is to be used for comparison, the visual system is composed of both a visual-location module and a visual-object module (corresponding to the dorsal “where” and ventral “what” visual systems, respectively). To reduce complications, the model focuses on choosing what to encode as opposed to how to encode an image; this is the difference between a theory of visual attention and a theory of visual perception. By implementing a theory of visual attention, results from eye-tracking experiments can be compared to locus of attention in the model.
The purpose of the goal/intentional module is to allow us to maintain a consistent train of thought in the absence of external stimuli. The point of the goal buffer is to keep track of our internal state within a given problem. It allows us to respond differently depending on our current goals, and ensures that our behaviour is appropriate to the current situation. People without a functioning goal module (those with prefrontal damage) often have difficulties with context-goal mismatch, and this is why their behaviour is sometimes inappropriate and stimulus-driven.
Whereas the goal module deals with short-term, problem-by-problem activity (”local coherence”), declarative memory deals with the long term (”personal and cultural coherence”). The workings of declarative memory are keenly elaborated on in this paper, specifically the mechanics of how chunks (single units of declarative knowledge) are activated within the module. Activation of any given chunk is the sum of base-level activation and associative activation; how useful the chunk was in the past, and how useful it is within the current context. Base-level activation is dependent on practise (the number of times a chunk has been rehearsed) and delay since the last rehearsal. Associative activation is dependent on the weighting of the elements that are part of the current goal and the strength of association between the elements and the chunk (which in turn is dependent on the number of facts associated each element). If a chunk’s activation is above a certain threshold, it can be retrieved. There is noise associated with a chunk’s activation, so a probability function is used to determine whether the activation is above the threshold. The time it takes to retrieve a chunk reflects that chunk’s level of activation.
Determining which actions are the most effective in a given situation is the responsibility of procedural memory. Because only one production can fire in each cycle and there are many productions that are applicable at any particular time, the production system has a way to select the production with the highest utility. The utility of a production is given as
,
where P is the probability of success for production i (determined by a ratio of successes to successes + failures), G is the value of the current goal, and C is the cost to achieve the goal. Cost and probability are continually adjusted with each experience with the production, according to a Bayesian framework. To prevent probability from zooming to 100% any time a first attempt succeeds, P is given an initial value q that converges upon the proper ratio as experience accumulates.
One part of procedural memory that hasn’t been fully ascertained is the process by which the relevant productions are determined. In some cases the relevant productions are obvious (if it’s a choice between two specific actions that are highly practised and are each represented by a single production), but for other situations there is possible error involved. In an effort to resolve this, the paper describes a mechanism called production compilation, which is a model of how production rules are learned. It essentially compiles two successive productions into a single production with the same outcome. If two productions continually follow one another, they will eventually be collapsed into one production. If the second production is dependent on information provided by the first, the collapsed production is specific to the retrieved information. A collapsed production can be created and rejected several times before it is selected over its parents. Each time it is created, its q value increases until its utility is higher than its parents’ and it is selected for the first time. If the new production is superior, it will be tried more and more often and come to dominate over them.
In the case of most of the equations involved in this model, there are some parameters that must be estimated. Although there are plans to remove the parameters entirely to appease critics, for now a set of standard parameters have been determined that appear to work in most situations. This makes it possible to predict results, as opposed to explaining them after the fact.
Aside from the studies used to verify the various subsections of the model, two studies were conducted to evaluate the model as a whole. The first made use of the Georgia Tech Aegis Simulation Program (GT-ASP), which is a computer game used to train Anti-Air Warfare Coordinators on board US Navy vessels. A program based on ACT-R was designed with a set of production rules to interpret a set of formalized instructions, and from this base, the program was able to convert the instructions into the productions required to perform the task. The goal of the GT-ASP user is to identify “tracks” (aircraft) by clicking on them and using the function keys on a computer keyboard to identify them by speed, elevation, and failing that, a request for the electronic warfare signal of the craft. This was designed to test how the various modules within ACT-R interact in the learning of a complex skill. Both the program and a set of participants were timed in their performance with GT-ASP, and the participants also had their eyes tracked to compare the direction of their gaze to switches of attention in the model.
The speed up in performance for the ACT-R program and the human participants were quite similar, but the rate of learning for the program appeared to speed up more than the participants did near the end of the testing period. When comparing the eye-tracking results to switches of attention in the model, results were also satisfactory. Aside from the proportion of attention paid to the function keys in the last segment of each trial, the correspondence is very good. Amount of attention paid by participants to the function keys did not start out as high and did not decrease as sharply as predicted by the model. This could have been because of the layout of the function keys. With twelve buttons in a row and nothing to differentiate any of them, locating the keys may have required additional attention (when moving from F1 to F9, for example, which is a significant distance).
The second study was meant to incorporate brain imaging. Learning in this case was of a symbol-manipulation skill, using an artificial algebra task outlined by Qin et al. (2003). The ACT model included an imaginal buffer (found to be located in the parietal lobe) that holds a mental image of the equation as it operates on the string of symbols. Comparison between the model and results from human participants was done by generating a conversion equation that maps the duration and delay of activity in a buffer onto the BOLD function.
Participants were timed in their performance over five days of practice, as well as being imaged on the first and last days. The ACT predictions fit extremely well with the timing data, and results from the imaging experiment suggested that the model was (at the very least) on the right track. The three regions of interest were in the motor area, the posterior parietal lobe, and the prefrontal lobe on the left side. It was found that an increase in the complexity of an equation was to delay the BOLD function in the motor region and the effect of practise was to move it forward; this was predicted by the ACT model. The model correctly predicted that the BOLD function in the parietal region would be little affected by practise but significantly affected by complexity, as the imaginal buffer must encode the equation in every condition, and the amount to be encoded changes only with complexity. The magnitude of the BOLD function in the prefrontal region decreased with practice, also as predicted.
The paper bemoans the evolution of modern psychology towards greater specialization, and that specialized psychologists don’t make the effort to understand the processes going on outside their tiny sphere of interest. The insinuation is that psychologists are simply ignoring each other’s work. While that argument may be valid for some (and only some — any adequate education involves at least a rudimentary explanation of every part of the brain), I find the comment rather strange, considering some of the constraints the authors have put on their own model. At the beginning of the paper, they write:
“Although there are good reasons for at least some of the proposals for specialized cognitive modules, there is something unsatisfactory about the resultÑan image of the mind as a disconnected set of mental specialties” (p. 1036).
Irony of ironies, what they propose is exactly what they decry: a model of the brain as a set of self-contained modules, disconnected from each other save for a single tie to a central production system.
My initial reaction towards the study was quite positive; most concepts seemed to follow logically from those before them, and comparisons with human behaviour were very close. When the paper began to delve deeper into the realm of brain imaging, however, I became quite sceptical.
The unfortunate thing about this subject is that complex cognition — even very basic cognition, let alone the integration of five different modules — is a very tricky subject when it comes to fMRI. Because so many things could be occurring at any given moment, it is very difficult to pinpoint the cause of this or that activation increase within parts of the brain. FMRI is still very young; some researchers still question its validity, and many more misinterpret its data. (Marco Iacoboni’s (2006) study of people’s responses to Superbowl ads, for example.) FMRI can be a useful tool to analyse the brain’s responses at a very basic level, but I would shy away from any study claiming to pinpoint, for example, the central executive by putting someone in a tube and telling him or her to plan a route to the grocery store. I suppose that is one of the issues that this paper hopes to remedy; to develop a model that is capable of taking every aspect of cognition into account, so as to better analyze the results of complex brain imaging studies. They never explain how they plan to do this, though, only that they plan to use it as such.
My scepticism may also be derived from the disproportional attention and hype being paid to fMRI studies. Say “fMRI proved this and that” to the average person, and they may not think as critically about it than if you said “I compared response times.” I can fall into the trap as well; I feel more comfortable with a process when I know where it occurs, and when I talk about a certain aspect of cognition I find myself unconsciously pointing to the area where it’s thought to take place. It adds an extra level of tangibility that makes it appear more valid than if it were just floating diffusely through the brain somewhere. Tell me that memory is made up of a network of nodes and I’ll accept it, but tell me it takes place in the ventromedial temporal lobe and I’ll have a picture in my head to attach to that concept. A recent (but as yet unpublished) study by Deena Skolnick of Yale University has documented this effect:
“[Skolnick] asked her subjects to judge different explanations of a psychological phenomenon. Some of these explanations were crafted to be awful. And people were good at noticing that they were awfulÑunless Skolnick inserted a few sentences of neuroscience. These were entirely irrelevant, basically stating that the phenomenon occurred in a certain part of the brain. But they did the trick: For both the novices and the experts (cognitive neuroscientists in the Yale psychology department), the presence of a bit of apparently hard science turned bad explanations into satisfactory ones” (Bloom, 2006, p. 1).
One of my biggest difficulties with this paper is that some of the methods they use to verify their results are still very approximate. For example, they have derived an equation to fit buffer activation with an fMRI blood oxygen-level dependent (BOLD) response, which requires the estimation of three parameters that each vary according to brain region. Although the buffer activation is described only by latency and duration of activation (both time-dependent paradigms), this equation converts that to brain activity, described by latency, duration, and magnitude of activation. I find it dubious that each set of parameters has to be specifically tuned for each brain region. Was it not the intent of this paper to develop a set of standard parameters with which to predict data rather than postdict them? It is mentioned later on that magnitude of buffer activation should be taken into account in order to account for activity in the basal ganglia, and I agree; not only would it take the basal ganglia into account, but it may also eliminate the need to invent new parameters for every new brain region under study.
An additional variable to take into account would have to be the vascular distribution of each brain region, and perhaps this is the reason why Anderson wishes to use varying parameters for each area. Harrison et al. (2002) determined a direct relationship between the capillary density and metabolic demand of brain areas, meaning that more active areas would be easier to detect with fMRI. Additionally, areas closer to the base of the frontal lobe (such as the orbitofrontal cortex) are more difficult to register because of their proximity to the nasal ducts. Areas where body tissue meets air cause gradients in the magnetic field that can result in distortion and interference of the BOLD signal (Li et al., 1995). These variations explain the need to add varying parameters to the model, but these parameters should be built into the conversion equation as being specific to the BOLD response, and completely unrelated to the activity of the buffers.
In future versions of ACT, I would like to see a theory of emotion included. Belavkin (2001) has found similarities in the properties of the ACT-R architecture to the activation theory of emotion. Marinier (2004) has discussed the possibility of incorporating a theory of emotion into integrated models like SOAR and ACT: chunks of information in declarative memory could be associated with certain emotions, resulting in phenomena like mood-congruent retrieval. Being in a certain mood would cause an increase in activation of chunks associated with that mood, allowing those chunks to be more easily retrieved.
Despite some of my difficulties with this model, it is the best and most comprehensive model of human cognition. It describes quite clearly how the modules in the brain are integrated, and although it has limits on the connective possibilities of the modules, there is enough flexibility in the model for that to be altered in future revisions. Even now, ACT-R has shown itself to be capable of quite impressive predictions in many distinct situations, such as symbol manipulation. Although I remain unconvinced of its predictions of the BOLD response, with a few tweaks of the conversion algorithm it could become quite powerful in that regard. This could open up many possibilities in the field of fMRI research, allowing cognitive neuroscientists to better explain the cause of activation in specific regions. With further development, it could live up to Newell’s dream of a unified theory of cognition.
References
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., & Qin, Y. (2004). An integrated theory of the mind. Psychological review, 111(4), 1036-1060.
Belavkin, R. V. (2001). The role of emotion in problem solving. Proceedings of the AISB’01 Symposium on Emotion, Cognition and Affective Computing, 49–57.
Bloom, P. (2006). Seduced by the Flickering Lights of the Brain. Seed Magazine, 5. Retrieved July 11, 2006, from http://www.seedmagazine.com/news/2006/06/seduced_by_the_flickering_ligh.php
Harrison, R. V., Harel, N., Panesar, J., Mount, R. J. (2002). Blood capillary distribution correlates with hemodynamic-based functional imaging in cerebral cortex. Cerebral Cortex, 12(3), 225-233.
Iacoboni, M. (2006, March 13). Who really won the super bowl? Edge, 177, Article 4. Retrieved July 11, 2006, from http://www.edge.org/3rd_culture/iacoboni06/iacoboni06_index.html
Li, S., Williams, G. D., Frisk, T. A., Arnold, B. W., Smith, M. B. (1995). A computer simulation of the static magnetic field distribution in the human head. Magnetic Resonance in Medicine, 34, 268-275.
Marinier, B. (2004). Towards a Unified Theory of Cognition and Emotion. Unpublished dissertation, University of Michigan.
« Hide the rest