Chapter Three:
Perception, Imagery and the Sensorimotor Loop

Perception is basically a controlled hallucination process.

- Ramesh Jain

 

Let us begin with the motor control centers of the central nervous system, and make some arguments, in the spirit of Kawato and Ito, that certain circuits in those centers act as emulators of musculoskeletal dynamics. I will then present some arguments to the effect that such emulators drive motor imagery. <9> Then, I will make the case that all imagery is best understood as emulational in nature. Finally, using some arguments compatible with and supported by considerations put forward by Llinas, I will argue that perception itself is best viewed as a sort of imagery, and hence as dependent, ultimately, on emulation.

3.1 Motor Control

This section will focus on some problems associated with motor limb control. The problem, hinted at in the previous chapter, is that a broad class of motor functions requires feedback faster than it is available from the periphery, and this feedback delay can cause oscillations or instabilities. These movements are fast voluntary movements, as opposed to slow voluntary movements, or cyclical movements like walking or running which might be largely controlled by central pattern generators in the spinal cord. One way that the nervous system can solve this problem is to use emulators of the appropriate system to provide much faster feedback, through processing of efferent copy signals, to the control centers.

3.1.1 Fast, Accurate Motor Control: The Problem

The brain faces many obstacles when trying to execute fast, accurate voluntary movements, many of which are similar to the problems faced by the robot arm operator in the previous chapter as a result of feedback delay. When the brain issues a motor command, it takes some time for the signal to traverse the spinal cord. In the most favorable case, there will be one synaptic relay where the efferent axon from primary motor cortex contacts a motor neuron, and then more delay as the motor neuron axon carries the signal to the muscle fiber. Then there is yet another synaptic transmission at the neuromuscular junction, and finally the muscle responds. The muscle stretch receptors and the Golgi tendon organs must relay proprioceptive information <10> from the limb back to the spinal cord, where it continues, again limited by axonal conduction velocities.

The exact temporal length of this feedback loop is not known, but there is good evidence to the effect that 500ms is an approximate lower bound for any proprioceptive information to be effectively available for feedback control. <11> That is, movements taking less than 500ms are executed (at least as far as proprioceptive feedback goes) open loop. Visual feedback is available much more quickly, but it is still subject to significant delays. Furthermore, there is good evidence that trajectory corrections are made in the absence of visual feedback, at latencies far below the proprioceptive feedback loop time (see below).



Figure 3.1: Pseudo-closed-loop control.

One way that greater accuracy could be attained in the execution of open loop control is to try to make the task pseudo-closed-loop by using an emulator to process efferent copy information, and using the emulator's output to make adjustments to the motor command. Figure 3.1 shows a normal closed loop feedback control schematic, where output from the plant is used by the controller (C) to adjust control signals. The lower schematic shows a pseudo-closed loop architecture, where efferent copy information is processed by an emulator, and its output is used by the controller to adjust the control signal. The necessity of such emulators of musculoskeletal dynamics for this and related purposes is recognized by a number of researchers in motor control. What is less agreed upon is the exact location and use of these emulators, though circuits involving the cerebellum and various brain stem and midbrain nuclei such as the red nucleus, pontine nuclei and the reticular nuclei seem to be the odds-on favorites. <12> The remainder of this section will outline the proposal made by Kawato for such an emulator. I choose this example because it is fairly well articulated, and it shows how such an emulator can be of use not only for the control of motor tasks, but also for motor learning (specifically, the acquisition of a good inverse model).

3.1.2 Fast, Accurate Motor Control: A Solution

As Kawato et al. write:

The spinocerebellum (vermis and intermediate part of the hemisphere) - magnocellular part of the red nucleus system receives information about the results of the movement ... an afferent input from the proprioceptors, as well as an efference copy of the motor command. [That is, this system sees a copy of the motor command, as well as the proprioceptive information it leads to. Association of the first with the second amounts to learning the forward model. -RG] Within this spinocerebellum-magnocellular red nucleus system, an internal neural model of the musculoskeletal system is acquired. Once the internal model is formed by motor learning, it can provide an approximated prediction of the actual movement... A predicted possible movement error ... is transmitted to the motor cortex and to the muscles via the rubrospinal tract. Since the loop time of the cerebro-cerebellar communication loop is 10 - 20ms (Eccles 1979) and is shorter than that of the supraspinal loop, the performance of the feedforward control with the internal model and the internal feedback loop is better than that of long-loop sensory feedback.
(Kawato, Fukahara and Suzuki 1987)


Figure 3.2: Schematic of musculoskeletal emulator. Adapted from Kawato et al. (1987).

Kawato proposes that a specific circuit (see Figure 3.2) performs the function of emulating the musculoskeletal dynamics. According to this model, efferent copies from the cortico-spinal tract are sent to the red nucleus, where they enter a circuit that traverses the inferior olivary nucleus, then to the contralateral dentate and cerebellar cortex via climbing fibers, and returns to the red nucleus via the dentate. This loop embodies the emulation of musculoskeletal dynamics, and provides a predicted position on the basis of the efferent copy. There are two possible ways that this information can lead to an update of the motor signal. First, the desired movement (communicated from the association cortex) can be compared in the red nucleus with the predicted movement, and a correction signal sent via the rubrospinal tract. Second, the predicted movement can be communicated to the motor cortex via the ventral lateral thalamus, where a correction can be made.

Not only does this allow for better feedforward control, but it also aids the learning of motor skills in several ways. First, the emulator circuit provides very fast feedback concerning the effect of a given motor command, and it is plausible to suppose that, at least in some cases, synaptic plasticity is easier to regulate when feedback is immediate. Second, learning of the inverse model can take place in the absence of actual movement. An action can be mentally rehearsed over and over to provide the controller with feedback about the results of its actions. <13> Notice that these two benefits, greater control and enhanced learning, which are direct results of the use of emulators, have long been attributed to the cerebellum and associated motor nuclei. Neurophysiology texts and articles almost always gloss the function of these structures as 'playing a role in the control of fine movements and motor learning,' but usually do so without any theoretical motivation. Emulation gives us exactly this motivation.

At this point it would be nice to see if there are in fact any data to support the claim that Kawato's (or some similar) control structure is in fact used. In a series of rather clever experiments, <14> subjects were asked to make fast arm movements from a start location to a goal location. Subjects were allowed time to see the goal, and were not under pressure to initiate the movement quickly (they had time to think about the motion before execution), but were asked to make the movement quickly and smoothly once initiated, and to try to stop the hand as close to the goal as possible. Trials were conducted both with and without visual feedback.

Position, velocity and acceleration analysis showed that the first 70ms of such movements have a great degree of variance, i.e. there is little inter-trial correlation in the distance traveled up to 70ms. However, after 70ms, but before proprioceptive information is available, and in the absence of visual feedback, corrective adjustments are made such that the total distance travelled in the acceleration phase is closely correlated across trials. (It was shown that visual feedback made a significant contribution only in the deceleration phase of the movement.) These initial corrective adjustments during the acceleration phase of the motion, because they were executed in the absence of visual feedback and beneath the long-loop feedback threshold, had to have been made on the basis of a short-loop internal feedback signal, which requires an emulator. Notice, finally, that should anything happen to the emulator/short-loop feedback system, only long-loop feedback would be available for corrective movements, and we could predict that this would result in oscillations or instabilities, especially towards the termination of a movement. As Kawato notes, this in fact is exactly what occurs as a result of cerebellar dysfunction, and is known in the literature as tremor (or specifically, intention tremor).

As a final note, Ghez and Vicaro (1978) and Ghez (1990) have reported neurons in the magnocellular red nucleus of the cat (one of the nuclei implicated by Kawato in the emulator loop) that modulate their activity as a function of the time derivative of force exerted on the limb, but do so before the limb is acted on. That is, the activity of these neurons predicts limb jerk. Ghez and Vicaro assume that these neurons are motor neurons innervating the muscle of the limb in question, which is one plausible interpretation. However, another possibility is that these neurons are in fact predicting future limb jerk on the basis of efferent copy signals. <15> Supporting the second interpretation is the fact that the same authors found that these cells also responded to passive limb movement (recall that an emulator needs not only an efference copy, but information concerning the current dynamic state of the target system). Regardless of which interpretation is correct in this particular case, the ambiguity points out the importance of theory (or lack thereof) in interpreting experimental results.



3.2 Motor Imagery

If the previous section is right, then the brain has circuits that take as input information regarding the current state of the body, as well as a copy of an efferent command signal, and compute as output the state or configuration of the body that will result if those motor commands are successfully executed. In the simplest case this end-state specification will be given in proprioceptive terms. The prediction generated by the musculoskeletal emulator is a prediction of the proprioceptive (as opposed to visual, etc.) state of the limb. This raises an interesting possibility, which is that this very same mechanism can support motor or kinesthetic imagery. The only modifications necessary are first, that the motor command itself be inhibited from acting on the musculature; second, that the 'current state' specification of the system be provided via some sort of recurrent pathway; and finally, that the outputs of this emulator be made available to those centers that support normal proprioception.

This section will argue that there is evidence that motor imagery is supported by similar structures that support motor function. <16> The next section will build on this idea, and make the case that all imagery can be similarly explained as a sort of simulated perception. The final section of this chapter will show that if imagery really is simulated perception, then the distinction between perception and imagination begins to blur in some interesting ways. But first to motor imagery.

One might think of motor imagery as primarily a sensory phenomenon, since its result is imagined sensations. However, if it depends on the processing of motor commands that are inhibited from acting on the musculature, then one would expect motor areas to be active during motor imagery. This is exactly the case. A host of studies <17> have confirmed the following pattern: During many sorts of overt motor activity (mostly sorts involving some degree of attention), not only does primary motor cortex show increased metabolic activity, but so do the supplementary motor (SMA) and premotor areas. However, during mental simulation of the same movements, primary motor cortex shows no significant increase in activity, but SMA and premotor areas do show increased activity. This would suggest that the efferent copy used to compute the predicted movement originates in SMA or premotor cortex, <18> and that during imagery, the normal efferent pathway is inhibited at or before primary motor cortex.

Another line of evidence also suggests that motor behavior and imagery have a similar origin. It has been shown <19> that vegetative functions (such as respiration and heart rate) are affected, at least in part, by central motor commands, as opposed to being driven by, e.g. venous blood CO2 concentration. For example, heart rate and respiration increase very quickly (within a few seconds) of the onset of strenuous activity, substantially before such activity could result in increased CO2 concentration. This suggests that such vegetative effects are set in motion by central motor commands, or copies thereof. It has also been shown (Decety et al. 1991, Wang and Morgan 1992) that imagined strenuous activity likewise increases heart rate and respiration, and that this increase is proportional to the intensity of the activity being imagined. Again, the suggestion is that overt motor activity and motor imagery are initiated by the same mechanism, but that in imagery overt activity is suppressed, and that this mechanism plays a role in modulating certain vegetative functions.

Additional evidence comes from the sports physiology literature, which has long confirmed <20> that imagined rehearsal of some physical activity facilitates its subsequent performance. On the assumption that normal motor performance increases with practice because feedback can be used to adjust the effector sequence, and on the further assumption that motor imagination is just like motor performance except that the emulator provides the feedback instead of the proprioceptors themselves, then the benefits of imagined practice are unmysterious. To the degree that the emulator faithfully reproduces the feedback that the real system would produce, exactly the same benefits accrue.

What I have not provided is any detailed neuroanatomical or neurophysiological model of the mechanics of motor imagery. I have made some suggestions which support the contention that those mechanisms are driven by the same areas that drive motor activity, and that the mock sensations produced are the outputs of an emulator, perhaps the same emulator implicated in motor control in the previous section. <21> But the bottom line is that the arguments made here are inconclusive, and I will not pretend otherwise. However, I do think that the considerations are plausible, suggestive and interesting. Next, I plan to generalize the story of this section to cover all imagery, especially visual imagery.

3.3 Visual Imagery

The previous section focused on motor imagery, the imagination of the exertion of effort and the resultant proprioceptive sensations such effort would normally produce. But motor commands have effects not only on future proprioception, but on future visual perception as well. When I walk ahead the visual scene changes in interesting and repeatable ways. When I examine a coffee mug by turning it around in my hands, the projection of the mug onto my retina changes, again in at least partially predictable ways. In fact upon reflection there seem to be very few visual scenes that are not changing continually as a function (at least in part) of centrally generated motor commands, whether to the legs, the hands, the neck, or the muscles of the eyes. This suggests that it might be possible to build a visuomotor emulator, one which has as input a current retinotopic projection (and maybe some past retinal states as well) together with a current motor command, and predicts what the next visual scene (or retinal projection) will be.

I will begin this section by describing a fascinating connectionist simulation by Mel (1986), a simulation of a robot that learns to be able to produce visual imagery, including rotation, zoom and pan, by constructing an emulator of its motor-visual loop. Before delving into the details of the model, let's look at why such a mapping really is an emulator. An emulator, recall, is a device that mimics the behavior of some target system, and there can be considerable latitude in what counts as a target. Target systems were defined in the previous chapter as the 'other' side of the control loop. In the case of visual perception, this includes everything between the motor effectors (innervating, e.g., the legs, neck, and eyes) and the visual input (on the retina, or area VI: the exact location is not important for the present conceptual point). Thus the target system in this case includes a great deal of the visual world, especially the statistical regularities and other invariances it displays. The important point is that there is at least a partial dependence of the next visual input on the current input and the current motor commands. For example, if I am foveating on a colored square, and I walk forward, the projection of the square on my retina will increase in size, ceteris paribus, and this dependence, and many others, can be learned and exploited.

Figure 3.3: Left and right eye views of a tetrahedron. Adapted from Mel (1986).

Mel's model does exactly this. The model (which is virtual) has two retinae and several motor effectors, including move forward, move back, move left, move right (I will assume here for simplicity that 'move left' and 'move right' refer to circular motion, such as moving around an object to get different views of it. My exposition of Mel's model will make some other simplifications as well. For greater detail, see Mel (1986)). Each of the two retinae is a 2-D array of processors, and each receives a projection of some 3-D wire-frame 'object,' such as a cube or tetrahedron (these left- and right-eye views are generated by a graphics package: see Figure 3.3). The model, a sort of virtual robot, can also 'issue' <22> any of a small number of motor commands, and the retinal projections are recalculated at each time step on that basis.


Figure 3.4: 'Contextron' processing unit. Adapted from Mel (1986).

The processing element, which is driven by external input in the normal 'visual' case, receives connections from neighboring processors, (f1- f5). These inputs are gated by 'motion context' inputs, m1 - m3, which gate on those inputs that act as good predictors during the sort of motion (backward, forward, etc.) that they represent.

During the learning phase, the robot simply moves itself around virtual 3-D objects, and learns to predict future 'retinal states' on the basis of the current retinal state and the motor command. The weights connecting the retinal cells to each other and to the motion context cells learn the forward model of the visual-motor transformations of 3-D objects. Each retinal unit projects to all neighboring units within a certain area, and in addition each retino-retinal connection is matched with 'motion context' connections which fire when and only when that motion is being executed (see Figure 3.4). These motion context connections have the effect (after training) of gating the appropriate intra-retinal projections that will act as good predictors in that motion context. For example, during motion directly toward an object which projects onto the retina, the resultant increase in projection size is manifested as radial motion away from the center of the retina (see Figure 3.5). Thus in the motion context 'move forward,' excitatory connections to units that lie in a straight line away from the retinal center will be gated on, while others will be gated off. This is exactly analogous to the bioreactor emulator (Figure 2.8). The emulator is trained by learning a mapping from current state and control signal to next state.

Figure 3.5: Effects of movement on retinal projection. Adapted from Mel (1986)
During 'forward motion' (A), the retinal projection of an object changes, typically as radial motion away from the center of the retina, provided the center of the retina is aligned with the direction of motion (B). Thus during learning, the connection from unit i to unit i+1 will be gated on during that motion context, as it predicts the future retinal state in that context (C).

The 'envisionment' phase of the model is pretty much the same thing (see Figure 3.6), only without the continuous external stimulation:

The second phase of [the model's] operation is the phase of internal simulation or envisionment, and runs concurrently with phase 1 -- but only becomes accurate after sufficient phase 1 learning. In phase 2, let us assume the array of [processors] is excited into some initial state of activation by the retina, when confronted by a novel 3-D object viewed from an arbitrary perspective. This internal visual state ... may be thought of as a "mental image" ... By issuing a motion-context ... and temporarily inhibiting the retinal pathway, [the model] can transform (e.g. rotate, zoom, or pan) this mental image through time in an approximation to the internal state sequence that would be driven by the retina, were [it] actually moving through its environment and "seeing" the changes. (Mel, 1986)

This model has some attractive features. First, as Mel notes, it does not try to compute an explicit 3-D representation of the visual object (a typical goal of traditional computer vision). Rather, the three-dimensionality of the object is implicitly represented by the way its image transforms under rotation, etc. This is suggestive because it points to a way in which 3-D information can be represented on a 2-D sheet of processors, something which natural vision systems seem to be able to do. We can speculate that once trained, the model, even during 'normal' visual perception, sees the 2-D projections as 3-D <23> because of their implicit, inherent 'move-around-ability,' something that is not there before training simply because the model has not learned their 3-D characteristics. The converse of this is that had the model not been able to build the forward model of 3-D visual-motor transformations, it would not have 'normal' 3-D perception.

Figure 3.6: Real (top row) and imagined (bottom row) rotation.
(adapted from Mel (1986))

As Mel points out, this is reminiscent of the Held and Hein (1963) result concerning the importance of motor-visual feedback in the development of normal vision. In this experiment, two kittens were raised in identical <24> sensory environments. This was done by putting the kittens in an apparatus which maintained their location and orientation in counter-part points of a radially symmetric visual environment. (see Figure 3.7). The only difference was that one of the kittens moved itself around the environment, while the other was passively moved around visually identical scenes. The result was that, even though both kittens received very similar visual information during development, only the one that moved itself around developed normal 3-D vision. This suggests that normal perception, in addition to imagery, might be dependent on emulation (this point will be explored in the next section).


Figure 3.7: Sensorimotor feedback apparatus. Adapted from Held and Hein (1963).

Furthermore, it turns out that the 'visual imagery' of the model respects the isochrony principle, which holds that imaged events and their overt counterparts exhibit similar temporal profiles. For example, it will take the model the same time to 'mentally rotate' an object 30 degrees as it takes to move 30 degrees around that object. This is unsurprising since the model's imagery capacity was learned from its overt counterpart. That human imagery respects isochrony suggests that the mechanisms of its generation are emulators of overt counterparts as well. For example, Decety and Michel (1989) found that subjects took the same amount of time to write a text fragment and to imagine writing the same fragment. Furthermore, subjects took the same time to write fragments with their left (non-dominant) hands as when imagining writing with their left hands.

Even more suggestive is a recent result in Farah et al. (1992). The authors tested a patient on the following task before and after unilateral occipital lobectomy. She was asked to imagine familiar objects moving toward her slowly, and to determine the approximate distance at which those objects began to extend beyond the periphery of her mental 'visual field'. The finding was that the distance doubled after the lobectomy, suggesting a concomitant reduction in the width of the field of the mind's eye (as might be expected, this change was noticed only in the horizontal dimension). Upon reflection, this experiment provides startling evidence for the current hypothesis that imagery is perception emulation. The result suggests that visual imagery depends on exactly the same machinery that supports normal vision -- all the way back at the occipital lobe! If imagery were just somehow computed on the basis of memories, or via some translation of information in a propositional format (e.g. Pylyshyn (1984)), then it is hard to see why removal of the occipital lobe should have any effect whatsoever.

3.4 Perception and Closed-Loop Imagery

The most common view of perception seems to be that it is almost exclusively a bottom-up, data-driven process. These processes, on this view, are the sole providers of perceptual information to central mechanisms, meaning that top-down influences are minimal. This seems to contrast with the case of imagery, because in such cases there is some sort of information available, sensations or qualia of a sort, but there is no bottom information to work its way up. Imagery thus seems to be a case of top-down, or maybe sideways-in, perceptual processing. It may be illuminating to recall again the bioreactor emulator in section 2.4 (Figure 2.9). There it was explained that the emulator has two possible sources for its sensor information. The first is the sensors themselves, and the second is recurrent connections from its own outputs. We might think of the first case as a sort of pure perception, and the second as a sort of pure imagery. Though these are the two extreme possibilities, there is a continuum of cases here -- indeed a continuum along several dimensions.

First, one might not wish to ignore either the feedback connections or the sensor input, but to combine or average them in some way. Sensors can be very noisy, and the perceptual environment can make a lot of false promises as well. Were we slave to all the deliverances of our senses, our cognitive lives would be pretty difficult, I think. An emulator could provide exactly the sort of statistical 'soft-focus' we would want to help us clean up the noise, and to do so intelligently on the basis of significant environmental regularities. On the other hand, sometimes the anomaly is exactly what we want to see. I don't want to speculate on how these mixtures might be achieved. If that is in fact what happens, then I take it to be an empirical issue how they interact.

Another dimension of variation is one of attention. In the bioreactor emulator case, the number of feedback connections is the same as the number of sensor inputs. But what if the possible space of sensor information were greater than could be handled at any given time through direct input? For example, what if it were possible to get a real external reading on only one of the three state variables at a given time (perhaps analogous to our limitation of foveating on only a small area at a given time)? In such a case one might feed the other inputs via recurrent connections, leaving only the one to run off of external sensors. It might even be possible for the emulator to determine which of its 'modalities' it has the least confidence in at a given time, <25> or which it thinks is most critical to maintaining an accurate model, and focus its attention on that input by gating it to the external sensors and the others to recurrent connections. Still, the entire sensory state of the emulator, its current inputs, would be a function of both the external sensors and feedback connections. <26>

This suggests the following picture of perception. A subject's 'perceptual world' has the capacity to operate completely closed-loop, to ignore all external input and simply run off its own outputs under the assumption that its predictions are accurate (as with imagination or dreaming). But, to varying degrees and perhaps by various means, the system can constrain the trajectory of the 'perceptual world emulator' with sensory inputs, just as one might occasionally reach out to support a child who is learning to walk, or reset a watch that has fallen out of synch. The idea is captured beautifully in the quote at the head of this chapter by Jain. On this view, perception is indeed a controlled hallucination process, the controls being provided, to a greater or lesser degree, by the senses.

Rudolfo Llinas has recently advocated a similar position. <27> He argues that REM sleep and wakefulness are very similar in many important respects, the only significant difference being the capacity for external sensory stimulation to make a coherent contribution to the intrinsic activity. He argues that the thalamo-cortical loop which subserves perception and awareness is best regarded as a closed loop that is capable of sustained auto-stimulation. Llinas and Pare are worth quoting at length:

The thalamus is considered to be the functional and morphological gate to the forebrain. Indeed, with the exception of the olfactory system, all sensory messages reach the cerebral cortex through the thalamus. Yet, synapses established by specific thalamocortical fibers comprise a minority of cortical contacts. For example, in the primary somatosensory and visual cortices, the axons of the ventroposterior thalamic and dorsal LGN neurons account for, respectively, 28% and 20% of the synapses in layer IV and adjacent parts of layer III (where most thalamocortical axons project). Even in primary sensory cortical areas, most of the connectivity does not represent sensory input transmitted by the thalamus, but input from cortical and non-thalamic CNS nuclei. Indeed, cortico-striatal, corticocortical and corticothalamic pyramidal neurons receive, respectively, 0.3 - 0.9, 1.6 - 6.8, and 6.7 - 20% of their synapses from specific thalamocortical fibers, while less than 4% of the synaptic contacts on multipolar aspiny neurons in layer IV originate in the thalamus.
...the thalamocortical network appears to be a complex machine largely devoted to generating an internal representation of reality that may operate in the presence or absence of sensory input.

One way to think about this view of perception is that perceptual processes are subserved by mechanisms that execute trajectory completions, a generalization of the popular connectionist notion of vector completion. Normal vector completion approaches are best suited to static knowledge structures, and constitute a special, temporally flattened case of trajectory completion. Llinas' proposal is that the 'reality emulator' is a trajectory completer, or trajectory continuer, which can be more or less constrained (in the normal 'constraint satisfaction' sense of the term) by sensory input.
The possibility that our entire world is an internal emulation opens up before us.

 

3.5 Conclusion

Much ground has been covered in a short span in this whirlwind chapter, and I am painfully aware that there are many important issues and counter-positions that I have ignored. But my goal has been to discuss specific possibilities, rather than demonstrate necessities.

I hope that by now the notion of emulation is clear. In this chapter and the last we have discussed emulators of robot arms, bioreactors, human musculoskeletal mechanics, perceptual apparatus, and even perceptual reality itself. In each case I have tried to make the point that it is plausible to suppose that the sort of emulation envisioned actually occurs. Furthermore, within each domain discussed, there are well-respected researchers who agree. I hope that the clear application of emulation to enhance motor control makes the strategy's phylogenetic appearance plausible. Finally, I hope to have captured the reader's imagination and charity. No doubt there are plenty of gaps in the story to fuel critics' fires, but I think there is enough of interest, and sufficient potential, to justify optimism as well as a little patience.

 



Footnotes:

[9] I will use the term 'imagery' or 'mental imagery' as a blanket term to cover all sorts of imagery, including visual, auditory, motor, etc. Imagery is often equated with visual imagery, but I want to keep it as a generic term, and will modify it with 'visual' or 'motor', etc., when necessary.
<Return to main text>

[10] The stretch receptors and Golgi tendon organs are sensitive, roughly, to muscle length and tension at the muscle-tendon junction, respectively. And these values are roughly correlated with joint angle and joint torque, respectively. In addition, as a function of how fast the receptors adapt, some are responsive to the values of these parameters, while others, which adapt quickly, are sensitive to changes in these parameters, and thus can give an indication of their time derivatives.
<Return to main text>

[11] See Denier van der Gon (1988). Of course this does not imply that it takes a full 500ms for such information to reach central control areas. It may be there somewhat faster. The experiments cited modify peripheral signals to determine their effects on fast movements, with the finding that, with movements that last less than 500ms, such modification makes no change to the movement. Given this, changes to the motor signal that occur significantly before 500ms cannot be made on the basis of peripheral feedback.
<Return to main text>

[12] The best and most thorough discussion of these control issues I have seen is Ito (1984). Others who explicitly argue for the necessity of a musculoskeletal emulator are Tsukahara and Kawato (1982), Kawato et al. (1987), Kawato (1989) (1990), Houk (1988) (1990), and Arbib (1981).
<Return to main text>

[13] It has been shown that mental rehearsal of physical tasks facilitates and improves subsequent performance. See Feltz and Landers (1983) for review of extensive literature; also see Yue and Cole (1992), and Jeannerod (1994). See also discussion below in next section. On a more speculative note, it is tempting to wonder if one of the functions of dreaming is to allow inverse models to train on emulators. As mentioned in the previous chapter, a benefit of training on an emulator is that it allows one to train for dangerous situations without assuming real risk, and to train on unusual situations that occur rarely enough to effectively negate any chance of experience with the real thing.
<Return to main text>

[14] See van der Meulen, Gooskens et al. (1990).
<Return to main text>

[15] To be speculative for a moment, it was pointed out earlier that inverse motor control mappings are ill-defined, and that motor control theorists add additional constraints such as minimum jerk or minimum torque change criteria to make inverse dynamics well-defined. Given that dF/dt is directly proportional to jerk, and closely related to torque change, cells predicting dF/dt could play an obvious role in dynamic profile determination, since a profile that minimizes the activity of such units would be a minimum jerk profile.
<Return to main text>

[16] The argument structure here owes much to Jeannerod (1994), who also argues that motor control and motor imagery are supported by similar structures. However, Jeannerod does not recognize that emulators are a necessary part of this puzzle (as I pointed out in Grush (1994a)).
<Return to main text>

[17] Some such studies are Decety, Sjoholm et al. (1990); Roland, Larsen et al. (1980); Fox et al. (1987); Ingvar and Philipsson (1977). The study by Decety, Sjoholm et al. (1990) is particularly interesting because not only do they confirm the pattern of rCBF (regional cerebral blood flow) at issue, but they also implicate the cerebellum in imagined motor activity. If the hypothesis of Kawato (and Ito and Houk, etc.) that the cerebellum participates in an emulatory loop is correct, and if I am correct in assuming that such an emulator supports mental imagery, then this increase in cerebellar activity during motor imagination is to be expected.
<Return to main text>

[18] Jeannerod (1994) reaches the same conclusion, though for different reasons (he is interested not in emulation but in motor imagery). He argues that the signals used for imagery originate in premotor cortex or the basal ganglia.
<Return to main text>

[19] See Goodwin et al. (1972) and Requin et al. (1991). See also Jeannerod (1994).
<Return to main text>

[20] See footnote 13.
<Return to main text>

[21] It is of course possible that there be more than one musculoskeletal emulator, one which is used for control purposes, and one which supports imagery. While the Decety et al. (1990) finding that motor imagery increases cerebellum metabolism suggests that it might be the same, I plan to remain neutral.
<Return to main text>

[22] It is not necessary that the robot 'willfully' act in any way. What is required is that the model be able to distinguish the different motor 'contexts', so that it can learn the forward mapping of those contexts. In animals, presumably, this information is available to the brain because the brain itself issues the motor commands.
<Return to main text>

[23] I don't think it implausible to assume that if a system can manipulate a 2-D visual representation in such a way as to preserve 3-D invariances, then it, at least implicitly, represents the object as 3-D.
<Return to main text>

[24] As close to identical as possible. See Held and Hein (1963) for a more detailed explication.
<Return to main text>

[25] For example, the outputs might include not only an estimated sensor value, but an estimated deviation as well, and a high estimated deviation would mean a low confidence in the estimate.
<Return to main text>

[26] Interestingly, the view of perception put forward here has the capacity to reconcile many of the conflicting intuitions regarding the top-down vs. bottom-up views of perceptual processing. If perception is sustained as part of an emulatory loop, then the influences of higher centers on the lower ones enter, as it were, at the bottom. Seeing the process as a loop, rather than a line, allows us to have our bottom-up cake and eat it too. We will be able to agree that perceptual processing goes from the bottom up, while also agreeing that higher mechanisms can influence it substantially.
<Return to main text>

[27] See Llinas and Pare (1991)).
<Return to main text>