Sam Clarke (University of Oxford)
Abstract: This paper advances three claims concerning the cognitive processes that underpin human goal ascriptions. First, I propose that many of our leading theories of goal ascription hold, or seem committed to holding, that the goals of others’ actions can only be identified through a process of approximately rational, abductive reasoning (§1). Second, I argue that there is reason to question this commitment. Some goals appear to be identified by fast, inaccessible and informationally encapsulated cognitive processes. This suggests that they are identified by input systems—akin to those involved in speech and sensory perception—rather than the central systems that rational abduction paradigmatically involves (§2). Third, I suggest that there are independent reasons to take this latter proposal seriously and no obvious reasons to reject it (§3). This presents a challenge to the existing views of goal ascription discussed in §1 and raises a number of important questions for future research.
Keywords: goal ascription, modularity, abduction, teleological stance, theory of mind
Humans ascribe goals to observed actions. By this, I mean to say that humans can and do identify the outcomes to which others’ observed actions are directed. And this is a good thing. Infants come to identify the goals of others’ observed actions before they come to identify and fully understand the mental states of others. Thus, it is often said that goal ascriptions facilitate many of our earliest social interactions (Csibra and Gergely, 1998). Indeed, it is sometimes said that these early goal ascriptions bootstrap the subsequent development of a mature theory of mind, as found in adult humans (Gergely and Csibra, 2003), and that they underpin much cognitive, linguistic and social development (Woodward et al., 2009). For these reasons, it is important that we gain a proper understanding of the cognitive processes upon which these goal ascriptions depend.
In this paper, I aim to make a small contribution towards this large project by introducing and developing three claims. First, I observe that many of our leading theories of goal ascription hold, or seem committed to holding, that the goals of others’ actions can only be identified through a process of approximately rational, abductive reasoning (§1). Second, I argue that there is reason to question this commitment. Some goals appear to be identified by fast, inaccessible and informationally encapsulated cognitive processes. This suggests that they are identified by input systems—akin to those involved in speech and sensory perception—rather than the central systems that rational abduction paradigmatically involves (§2). Third, I suggest that there are independent reasons to take this latter proposal seriously and no obvious reasons to reject it (§3). This presents a challenge to the existing views of goal ascription discussed in §1 and raises a number of important questions for future research.
1. A Popular Stance
At one stage, it was common to stress the importance of mental state ascriptions in the everyday prediction and explanation of human behaviour. For instance, Daniel Dennett (1987) famously proposed that the “only” (p.21) way by which normal humans, going about their day-to-day lives, could predict the behaviour of other humans, was by adopting the intentional stance: by assuming others to be rational agents striving to fulfil their desires in the light of the beliefs we take them to hold about the world. Relatedly, Dennett suggested that our only practical means of explaining such behaviour would be by inverting this process and ascribing beliefs/desires to the agent in order to rationalise and make sense of it.
Much recent work on goal ascription downplays this emphasis on mental state ascription. But, it is not hard to see the influence that proposals, like Dennett’s, continue to have on those theorising in this area. Gyögy Gergely and Gergely Csibra (2003) are an obvious case in point. They deny that mental state ascriptions are necessary for the reliable prediction and explanation of certain human behaviour. But, they say this because the inferences that Dennett claims to underpin the prediction and explanation of behaviour need not be applied to agents’ mental states, themselves, but can, instead, be applied to the contents of these. For example, they propose that we might predict the way that Bob would reach for the biscuit, on the assumption that this behaviour is being rationally directed towards the goal of an eaten biscuit, given the constraints of reality imposed upon this behaviour (e.g. obstacles in the way), without this goal or these constraints of reality being represented as the contents of Bob’s beliefs and desires. Similarly, it is claimed that the goal of Bob’s action can be inferred by working backwards from the behaviour; by assuming that this is being, or has been, pursued rationally by Bob given the constraints of reality. In this way, these theorists downplay the importance of full-blown mindreading in action interpretation, but they hold onto the core of Dennett’s suggestion, proposing that the “inferential principle” of this, so called, teleological strategy “is identical to the rationality principle of the mentalistic stance” which Dennett defends (Gergely and Csibra, 2003, p.287; see also Gergely et al., 1995, and Southgate et al., 2008).
For this reason, Gergely and Csibra’s account can be seen to rely on abductive reasoning. This much is widely acknowledged when it comes to the mental state reasoning that Dennett describes (Braddon-Mitchell and Jackson, 1996; Gopnik and Meltzoff, 1997; Gopnik and Wellman, 1992; Lewis, 1994; Nichols and Stich, 2001). Here, no finite number of beliefs and/or desires possessed by a subject will rationally necessitate that they act in any given way. For instance, Bob’s belief that there are biscuits in the tin coupled with his desire for biscuits will only cause him to rationally produce (and should only lead us to rationally predict) tin opening behaviour from Bob if all else remains equal; further beliefs and desires could always rationally alter his behaviour and, thus, the predictions we would otherwise rationally make about his behaviour. So, short of grasping Bob’s mental life in its entirety as Bob’s mental life in its entirety—something that is beyond us as finite human observers—reliably predicting Bob’s behaviour via Dennett’s intentional strategy involves us identifying the most likely behaviour that Bob will perform from a fragmentary and defeasible evidential basis; it involves us making an inference to the best explanation.
Similar points apply to Gergely and Csibra’s teleological strategy. Just as further belief/desire ascriptions could always alter the behaviour one should predict of a rational agent on Dennett’s story, further information about the constraints that reality places on action could always alter the behaviour we should predict of an agent when adopting the teleological stance. For instance, if we identify Bob’s behaviour as directed towards the goal of eaten biscuits we might predict him to reach into the conveniently placed tin containing biscuits, but, perhaps not, if the tin is locked (unless there is a large hole in its rear, etc.). Since further information of this sort could always alter the behaviour it would otherwise be reasonable to predict of a rational agent when adopting the teleological strategy, the application of such a strategy requires us to predict behaviour on the basis of a fragmentary and defeasible evidential basis in the same way as on Dennett’s account; it involves us reasoning abductively about the behaviour in question, flexibly taking into account any salient information available to us about the world and the goals to which the target behaviour is directed in the same way that Dennett’s intentional strategy involves us flexibly taking into account any relevant belief/desire ascriptions.
The same holds for goal ascriptions on Gergely and Csibra’s account. Here, the constraints of reality and the agent’s movements underdetermine the goal to which these movements are rationally directed. When Bob reaches and grasps the biscuits in the tin, say, we can ask whether the goal to which his action was rationally directed was to have grasped the biscuits, to have grasped whatever was in the box, or some other end with no obvious relation to biscuit grabbing (perhaps this was an accident)? In any case, the answer is underdetermined by the mere fact that Bob has grasped the biscuits and could always be affected by further information available to us about Bob’s movements and the constraints that reality imposes on these. Thus, its identification involves us making an inference to the best explanation that, at least, approximates a rational sensitivity to any salient and accessible information about Bob and the world he inhabits.
Is this true of all goal ascriptions? Gergely and Csibra seem to think so. Their suggestion seems to be that while adult humans can adopt a full-blown mentalistic stance when predicting and explaining behaviour, infants lack the necessary mindreading abilities and therefore have no choice but to adopt the teleological stance when predicting others’ behaviour and identifying the goals of their actions. As they put it:
to interpret such an event as a goal-directed action infants must establish a specific explanatory relation among three elements: the action, the goal state, and the constraints of physical reality. (Csibra and Gergely, 1998, p.255, my emphasis)
In this way, Gergely and Csibra suggest that if humans are to succeed in making accurate goal ascriptions they have no option but to adopt and utilise a rationalistic stance (either mentalistic, teleological, or perhaps some mixture of the two). Since adopting such a stance involves abductive reasoning, it would then seem that, according to Gergely and Csibra, humans must engage in abductive reasoning if they are to identify the goals of others’ actions.
This is a bold claim, but it is not uncommon. Gergely and Csibra are often seen to provide the single best account of goal ascription currently available. But, even among theorists who remain agnostic or critical of the details or import of the account, we often find strong hints that goal ascription is taken to require some form of abductive reasoning very much like that which Gergely and Csibra propose. For instance, Amanda Woodward—another leading authority on goal ascription—frames her discussions in terms of subjects’ capacity for “reasoning” (1998, p.1) about the goals of observed actions, flexibly taking into account things they believe about agents, actions and the environment (see also Phillips and Wellman, 2005). Indeed, like Csibra and Gergely, Woodward cites Dennett when claiming that it is this early capacity for ‘reasoning’ about the goals of others that facilitates the subsequent development of full-blown mental-state reasoning (p.2). This suggests that the picture of goal ascription and development that she has in mind is similar to that developed by Gergely and Csibra and that it is underpinned by similar cognitive processes.
None of this is to suggest that there is absolutely no resistance to this general trend. Some have held that certain goals are ascribed via simple, observable cues, rather than processes of full-blown, rational abduction—cues like self-propulsion (Premack, 1990) and direction of gaze or movement (Baron-Cohen, 1994). This is a view that, as we shall see, I have some sympathy with. But it is often seen to be problematic (Csibra et al., 1999; Gergely, 2002). Moreover, such accounts have sometimes been motivated by the thought that early goal ascriptions should be interpreted in richer terms than theorists like Gergely and Csibra advise; that they reflect the fast and dirty heuristics of systems capable of full-blown, rational and abductive, mental state reasoning. Consequently, the assumption that identifying the goals of others’ actions requires some kind of rational abduction—or at least implicates systems capable of rational abduction—seems to be widespread, and deeply entrenched, in much contemporary theorising about goal ascription. It is this assumption that I want to question.
2. Troubles With the Popular Stance
The proposal that all goal ascriptions rely on rational abduction—or, at least systems capable of rational abduction (as many seem to suggest)—is empirically tractable. To see this, consider Fodor’s infamous distinction between the mind’s modular and non-modular parts (1983). For Fodor, abduction of the above kind depends upon non-modular, or central, cognitive systems (1983; 2000). This is because rational abduction is a paradigmatically unencapsulated process that is (or can be) directly affected by any of one’s beliefs, at least in principle—e.g. anything one believes about the constraints of reality and more. For Fodor, modular input systems are not unencapsulated in this way—rather, they are largely unaffected by what their subject knows/believes (about, say, others’ mental states or the relevant constraints of reality) even when these beliefs are salient and reflected upon. Thus, within a Fodorian framework, accounts that take all goal ascriptions to involve rational abduction (or systems capable of rational abduction) seem committed to holding that all goal ascriptions depend upon non-modular central processes. This is noteworthy, since non-modular central systems are said to display a number of properties to a striking extent, when compared with modular input systems. Thus, if it is true to say that all goal ascriptions depend upon abductive reasoning (and, thus, central resources) the Fodorian should expect the processes involved to display these properties.
I happen to endorse an essentially Fodorian picture of the above sort. That being said, it is important to acknowledge that the details of Fodor’s purported distinction between modular and non-modular systems are controversial, if only to note that much of this controversy is irrelevant for our purposes. One apparent challenge comes from the Massive Modularity Hypothesis. Proponents of Massive Modularity have argued that the mind is entirely (or almost entirely) composed of modular systems, even at its most central parts. This may appear to undermine the Fodorian distinction just introduced. However, in reality, proponents of Massive Modularity are often careful to note that when they describe central systems as ‘modular’ they do not mean to suggest that they are ‘modular’ in the sense that Fodorian input modules are (Carruthers, 2006, p.12; Pinker, 2005). Indeed, these theorists will typically remain open to an essentially Fodorian distinction between input and central systems and the existence of distinctive properties manifested by each (Deroy, 2014). In this way, they simply change the subject (Prinz, 2006).
Perhaps more troublingly, various theorists have denied that any cognitive systems qualify as modular in Fodor’s strict sense. But, even these theorists are happy to appeal to the properties that Fodor identifies as indicative of a modular/non-modular process as indicative of a process being relatively perceptual or relatively cognitive (for example, compare Briscoe 2010 and 2015). This is because, while these theorists deny that input systems possess the properties that Fodor takes to be indicative of modularity, to the extent that Fodor proposes, they will typically acknowledge a difference in the degree to which abductive reasoning and rational thought manifest the properties that Fodor takes to be indicative of modularity when compared with the operations of input systems involved in speech and sensory perception (e.g. Clark and Lupyan, 2015). So, if we were to find that certain goal ascriptions displayed the properties that Fodor takes to be indicative of modularity, to the extent that input systems manifest these properties, then this would still provide good evidence against the suggestion that all goal ascriptions are the product of rational abduction.
With this in mind, I will now introduce three reasons for thinking that the processes underpinning certain simple goal ascriptions do not possess the properties we would expect of them if they were the result of rational abduction. This will not be to prove that rational abduction is not involved and, in each case, I will try to show how proponents of rational abduction’s indispensability for goal ascription might respond. That said, when these reasons are considered collectively, I will tentatively propose that some goal ascriptions look to be the achievement of input systems, akin to those involved in low-level speech and sensory perception; a conjecture that I will suggest we have independent reason to take seriously in §3.
One reason to question the idea that rational abduction (and, therefore, central processing) is involved in all cases of goal ascription concerns the fact that humans, and even young infants, are apparently able to identify and react to the goals of certain observed actions very quickly. For instance, a recent study by Reddy et al. (2013) measured three-month old infants’ postural changes in response to a caretaker’s actions directed at picking them up to hold. It found that selective shifts in the infants’ posture (e.g. the straightening or stiffening of the legs and widening or raising of the arms in response to the caretaker’s behaviour) were evident “immediately after the onset of (the caretaker’s) approach” (p.1) or, perhaps on a more cautious assessment, within the first 100ms of approach onset (p.5).
This is amazingly fast. Reacting to the caretakers’ behaviour in these cases would seem to necessitate the prior perceptual recognition of the caretaker and their movements. But, often, perceptual input systems will only produce their outputs within a comparable timeframe. For instance, by Potter’s (1975) estimate, the identification of phonemes in others’ speech—among the fastest of all cognitive processes (Fodor, 1983, p.61)—takes between 125-167ms. Thus, if this is correct, there simply seems to have been no time for the infants to have reacted to the caretaker’s movements on the basis of post-perceptual reasoning: i.e. for their reactions to have involved perceptual identification of the caretaker and their movements plus further cognitive work (e.g. rational inferences carried out by post-perceptual, central systems).
It is then striking that the infants in this study were plausibly tracking and responding to the goals of their caretakers’ actions. This is evinced by the fact that the infants in this study were selectively responding, in appropriate ways, to actions with a certain goal (holding me), despite apparent variability in the kinematic structure of actions directed towards this end (see p.4 of target article and Fantasia et al., 2016). This provides evidence that the infants, in the above study, were tracking and responding to a goal of their caretaker’s actions, within the above timeframe, rather than some more local feature (or features) of the caretaker’s behaviour. And, if correct, this provides suggestive evidence that the infants’ timely behavioural responses were genuinely the result of a goal ascription, carried out by perceptual input systems, themselves, as opposed to those central systems upon which rational abduction depends.
Of course, we should avoid placing too much weight on the findings of a single experiment. However, similar results can also be found with more complex goal ascriptions too. For instance, Shim et al. (2005) found that experienced tennis players will react to the goal trajectory of opponents’ serves within 127ms of movement onset. And, similarly, Ambrosini et al. (2013) found that humans anticipate the goals of simple reaching actions by taking into account kinematic variables, such as grip aperture, from 6 months of age. This was implied by subjects’ anticipatory eye movements towards a target object, as much as 800ms prior (p.5) to the completion of observed reaching actions taking between 1720-2280ms (p.3); a significant finding since further studies carried out by the same group have suggested that much of the relevant kinematic information being utilised in these studies is not available until 60% of movement time has elapsed (Ansuini et al., 2015, p.8, p.11). Thus, it is plausible that some subjects were performing their anticipatory looks within 112ms of observing the kinematic cues utilised in this study.
Admittedly, the interpretation of all these results is complicated (perhaps subjects guessed before all of the information was in) and involves divining the difference between the representation of goals in others’ actions and our mere sensitivity to more local cues in their behaviour (a point where critics might wish to resist the above suggestions) or to actions of a certain kind, irrespective of any goal ascriptions. Nevertheless, the above studies are at least suggestive; taken at face value they suggest that certain goal ascriptions are performed faster than we would expect if they were the result of the post-perceptual, central processes that are paradigmatically involved in rational abduction.
A second reason to doubt that all goal ascriptions involve rational abduction pertains to the apparent inaccessibility of the information and/or processes involved in some such ascriptions. As Fodor observes, the cognitive states that function as inputs or outputs of rational thought are typically accessible for central monitoring by their subject, in a way that the information utilised by modular input systems is not. For instance, I am unable to introspectively access the information utilised by my early visual systems. But, by contrast, I can typically identify the beliefs and desires that rationally guide my abductive reasoning, at least under the right conditions (Fodor, 1983). Consequently, if/when goal ascriptions are the result of abductive inferences, carried out by central systems, we might expect the information utilised to be relatively accessible for central monitoring.
There is reason to doubt that such information is always accessible in this way. To see this, note that information about the kinematics of action—e.g. subtleties in wrist velocity and grip aperture during reaching—reliably covaries with the goals of surprisingly complex actions; e.g. whether an agent is going to pick up an object to eat, throw, or give away (see Becchio et al., 2012). Moreover, note that various studies have suggested that humans actually use such kinematic information to identify the goals of others’ actions, at least under certain circumstances (see Manera et al., 2010; 2011a; 2011b; 2011c). Such information is, plausibly, not accessible for central monitoring. Admittedly, this has not been formally tested. Nevertheless, the evidence is, I think, suggestive.
For a start, experimenters consistently report that adult subjects tested in these studies appear to be unaware of their sensitivity to the aforementioned kinematic cues, even during debriefings conducted immediately after tests demonstrating their sensitivity to them and their capacity to use these in making goal ascriptions. That is to say, these subjects appear to have been both unaware of how they were identifying the goals of actions observed in the studies they participated in and unaware that the kinematic cues they utilised were even available to be utilised (C. Becchio, pers. comm.).
In and of itself, this lack of awareness does not show that such information was inaccessible for central monitoring. After all, much of our day-to-day decision-making appears to be influenced by factors that are beyond our ken. For instance, there is a wealth of evidence indicating that even educated, liberal hiring committees are prone to various biases (e.g. Steinpreis et al., 1999). Such findings can surprise and scare, but while hiring committees may be unaware, and even shocked to discover, that factors, such as the gender of an applicant, affect their decisions about which staff to hire, this information is not inaccessible to them. Typically, they are, or can be made, aware of a given applicants’ gender and—in the knowledge that such information unfairly affects their decision-making routines—they can choose to rethink their snap judgements. But I think there is at least anecdotal evidence to think that this is not so when it comes to the kinematic cues that Becchio, Manera and others reveal to underwrite certain goal ascriptions. It is this: I’m someone who spends an inordinate amount of time in a busy café on Cowley Road, thinking about the aforementioned kinematic cues. But even in the knowledge that humans grip aperture is smaller and peak grip closing velocity slower when my fellow café goers grasp mugs to pass them to a barista than when they grasp these with the goal of relocating them at a table (Becchio et al., 2008), etc., this is something I seem unable to identify in others’ actions despite repeated attempts. Admittedly, this is far from laboratory conditions. Nevertheless, it is suggestive that such cues may be inaccessible to me—a non-autistic, enculturated subject who apparently utilises such cues when ascribing goals to certain observed actions. And this, in turn, is at least suggestive that these cues—which are used to make certain goal ascriptions—may be processed by systems that are not engaged in rational abduction, e.g. input systems, like those involved in speech and sensory perception.
A third reason to question the idea that abductive reasoning underwrites all human goal ascriptions concerns the apparent levels of informational encapsulation that the processes underpinning certain goal ascriptions plausibly display. This is suggestive since, as has been mentioned, striking levels of informational unencapsulation are often taken to be paradigmatic of the central processes that make rational abduction possible. As Fodor (1983; 2000) reminds us throughout his work, the conclusions of one’s abductive inferences can be affected in arbitrarily complex ways by any salient proposition(s) that the subject believes, at least in principle. This shows that central resources must have access to everything that the subject believes, at least in principle, and makes these central processes quite unlike those of modular input systems and the like that are taken to be largely insensitive to much that their subject believes at any given moment. Consequently, if there were to be evidence that the processes underpinning certain goal ascriptions were systematically insensitive to much that their subject believes in the way that systems involved in speech and sensory perception are, then this would speak against the hypothesis that all goal ascriptions are the product of rational abduction.
Plausibly, some such evidence exists. In a study conducted by Southgate, Johnson and Csibra (2008), 6-8-month-old infants were habituated to a variety of well-formed reach-and-grasp actions that required the agent to first move a box out of the way. These were all directed towards a common object and the well-formed nature of these goal directed actions led the infants to identify the goal of these actions as contact with the common object. Infants were then tested on one of two conditions. In the first condition, infants were shown a reach-and-grasp action that was similar to those that they had been habituated to except that it required the agent to first move a further box out of the way in order to reach the target object. In the second condition, reaching and grasping occurred in the same situation as the first, but here the agent neglected to move any obstacles out of the way and instead performed a biomechanically impossible snaking movement to reach the target.
Looking times suggested that infants were more surprised by the first condition. This was taken by the authors of the study to support the teleological stance hypothesis because it suggested that the infants only cared about how well-formed the test action was given its apparent goal and the constraints of reality—in their terms, it suggested that the infants only cared how ‘rationally’ or ‘efficiently’ the action was performed given the constraints of external reality. This is because, apparently, they did not consider how they would themselves perform the action (pace Woodward, 1998). Nor did they take into account the (albeit, limited) knowledge they would have apparently had about the biomechanical constraints on other humans’ actions (see Berenthal et al., 1984) and, in particular, limb movements (Berenthal et al., 1987). But this is a puzzling finding if the infants were abductively reasoning about the agent’s behaviour. Central resources, involved in even approximately rational abduction may well be subject to biases and heuristics that govern the kind of information that gets considered when making snap decisions (Tversky and Kahneman, 1974). But typically, these biases and heuristics cause subjects to place undue emphasis on the salient features of a stimulus when reasoning about it (Maheswaran, Mackie, & Chaiken, 1992; Coulter & Coulter, 2005; Thorndike, Sonnenberg, Riis, Barraclough, & Levy, 2012; Mitchell et al., 1996; Birch and Bloom, 2007). This makes these biases quite unlike the heuristics governing infants’ disregard for biomechanics in the above study since, here, infants were disregarding a highly salient and unfamiliar feature of the stimulus (the unnatural bending of the forearm); a finding made all the more surprising by the fact that, as we have already seen, infants are, at this age, able to use far less salient kinematic information—such as subtleties in grip aperture that are, plausibly, inaccessible to the subject —when ascribing goals to actions (see Ambrosini et al., 2013). Tentatively, I would then like to propose that—when taken at face value—this study plausibly suggests interesting levels of informational encapsulation in the processes responsible for certain goal ascriptions. [Further studies suggesting that infants process only certain kinds of information (regardless of its apparent salience) when reasoning about the goals of observed actions include: Gergely et al. (1995), Kamewari et al. (2005), Phillips and Wellman (2005), Csibra (2008), Southgate and Csibra (2009), Henrik and Southgate (2012), and Feiman, Carey and Cushman (2015).]
Admittedly, this example is not perfect. One might have methodological concerns about the looking time paradigm employed (see Aslin, 2008). Moreover, the belief-independence I have suggested this study to reveal is only evidenced indirectly. In this respect, it is unlike the classic illustrations of informational encapsulation employed by Fodor (1983) and Pylyshyn (2000) that involve subjects explicitly reflecting on their beliefs about a stimuli that are in conflict with its appearance; e.g. cases where the lines of the Muller-Lyer illusion continue to look different lengths even when subjects know and reflect on the fact that this is not so. But, studies could be run to test the belief-independence of goal ascriptions in much the same way. One way in which this could be done would be by exploiting the findings of existing studies that suggest the automaticity of certain goal ascriptions (e.g. Scholl and Gao, 2013). While such studies have typically demonstrated that the effects of goal ascriptions on behaviour are apparent, even when these are irrelevant and detrimental to the subject’s current task, studies could be run to test the effects of goal ascriptions on behaviour even when the subject’s explicit beliefs contradict these ascriptions. For instance, where explicit knowledge about the action’s goal contradicts that suggested by kinematic cues, like wrist velocity and grip aperture, as discussed in the previous subsection. If the effects of such (mistaken) goal ascriptions were evident on subjects’ behaviour, even when the subject explicitly reflects on her true and conflicting beliefs about an agent’s goals, this would provide more direct evidence of the belief-independence of certain ascriptions. In the meantime, however, I will content myself with the suggestion that some studies may suggest the encapsulation of certain simple goal ascriptions, that this is surprising if these ascriptions depend on abductive reasoning, and with the fact that future empirical work could be used to assess matters further.
2.4 A Tentative Suggestion
I have now introduced three reasons to question the idea that all goal ascriptions are the result of rational and abductive reasoning. I have suggested that we take seriously the idea that some goal ascriptions are:
- made within a similar time-frame to perceptual input processes
- driven by inaccessible cues
- the result of encapsulated processes
These are properties that are uncharacteristic of the central systems that rational abduction is seen to involve, even among proponents of Massive Modularity (Carruthers, 2006, p.12) and even among critics of modularity more generally (e.g. Clark and Lupyan, 2015). Thus, to the extent that (1), (2) and (3) are plausible, the idea that rational abduction is responsible for all goal ascriptions should be called into question.
Admittedly, I have noted that there are ways in which one could resist (1), (2) and (3). But, I take it that the considerations discussed remain suggestive. From the perspective of a neutral onlooker, the above findings do not naturally look to be a product of central systems, performing abductive inferences. Instead, they look to indicate the workings of input systems, akin to those involved in speech and sensory perception. Why? Because findings of the sort discussed throughout this section would not only be accommodated (via ad hoc auxiliary hypotheses), but actually predicted on a view which deemed the goal ascriptions under consideration to be the product of such systems. This is because input systems of this sort are widely noted to be fast, inaccessible and encapsulated in much the same way—a fact that, arguably, requires an explanation by appeal to the kind of system they are (Butterfill, 2007). So, while tentative, the most natural thing to say in the light of (1), (2) and (3) is, I think, that some goal ascriptions look to be made by input systems, akin to those involved in speech and sensory perception.
3. Independent Motivations
Not everyone will be convinced. As we saw in §1, a significant number of theorists hold that all goal ascriptions are the result of rational abduction and, therefore, cannot plausibly be carried out by the input systems that are characteristic of speech and sensory perception. Indeed, I can foresee critics claiming that parsimony favours such one-size-fits-all approaches to goal ascription; that since humans can and do sometimes reason abductively about the goals of certain actions, it would be more parsimonious to suppose that all goal ascriptions are underpinned by the systems these inferences involve and that we should, therefore, seek to accommodate (1), (2) and (3) within such a framework. I will now provide reasons to resist such an argument. There are, I propose, general considerations that speak in favour of the idea that some goal ascriptions will be made by input systems, akin to those involved in speech and sensory perception. Consequently, the kind of tentative findings made in §2 are not wild and outlandish, but highly plausible and deserve to be taken seriously.
To begin to see this, note that action understanding and speech comprehension are underpinned by similar processes—something that is, perhaps, unsurprising given that speech just is one kind of action (Brownman and Goldstein, 1992; Clark, 1997; Liberman and Whalen, 2000). As is widely agreed, speech comprehension involves perceptual systems parsing relevant sensory inputs into useful chunks, suitable for semantic analysis—e.g. discrete phonetic units and, from these, words (Saffran et al., 2008) and clauses (Soderstrom et al., 2005)—and similar points apply to the comprehension of observed action. Here too, observers must first parse observed behaviour, and recognise individual actions as bounded units, in order to identify goals in this behaviour and to provide rationalising explanations for it (Baldwin and Baird, 2001). And there is reason to think that this parsing is underwritten by processes that are analogous to those which underwrite the parsing of speech (Newtson et al., 1977). As with speech perception, the processes involved in action parsing operate independently of any semantic knowledge possessed by their subject (Samuel, 1981; Saylor et al., 2007), and it has been suggested that domain specific, generative knowledge must be at work in either case so as to enable observers to parse and identify novel words and actions (Baldwin and Baird, 2001, p.176). Similarly, in both cases, the systems involved utilise perceptible cues, like gaze direction, body posture, gestures and the like (ibid. p.173). These considerations point to often-overlooked similarities between the input systems involved in speech perception and the processes that are involved in the interpretation of observed physical behaviour.
There is, of course, a question of just how similar the processes are. But, given that close similarities have been suggested, it is natural to consider what we might learn about one case from the other. This is pertinent for our purposes since the systems involved in speech perception manifestly perform something importantly similar to the kinds of goal ascription we have been discussing throughout this paper; they operate to identify the outcomes to which others’ speech acts are directed by abstracting away from idiosyncrasies in the ways that these outcomes are brought about (e.g. individual differences in movements produced by the speaker’s vocal tract and mouth, and differences in their accent, pitch and timbre). For instance, when a speaker intentionally produces a noise registered by observers as belonging to the phone class /ba/, the realisation of the relevant noise is an outcome distinct from the behaviour leading up to the realisation of this or the speaker’s intention; it is a kind of goal. In this way, the phonemes that input systems identify and categorise can be thought of as a kind of goal, contributing to wider goals, like the production of words, which contribute to the realisation of bigger goals still, like the production of clauses.
Might something similar be true of the systems involved in the parsing of observed action, more generally? There is some reason to think so. While physical actions are parsed at different scales, they are parsed at goal boundaries, specifically. For instance, when Bob intentionally reaches, grabs, and eats his biscuit he realises a relatively large-scale goal (an ingested biscuit) by realising various sub-goals (e.g. contact with the biscuit, biscuit located in mouth, etc.). This much seems to be reflected in the operations of systems involved in parsing observed action. For instance, action parsing operates at various scales; identifying sub-actions at intention boundaries, and wider actions that these contribute towards, again at intention boundaries (Newtson, 1973; Newtson et al., 1976; Zacks and Tversky, 2001). This, itself, suggests that these systems are sensitive to the goals of observed actions—i.e. when the outcomes to which intentional actions are realised. But, depending on how closely these systems are to be modelled on analogous systems involved in speech perception, it is possible that this is all that they care about. Just as speech perception involves input systems abstracting away from idiosyncrasies in the production of speech, and categorising the phonemes that the speaker intends to produce, systems involved in action parsing, more generally, might abstract away from idiosyncrasies in the realisation of goals and simply function to identify and categorise these.
One reason for taking this latter suggestion seriously is that, according to various theories of speech perception, the categorisation of phonemes and allophones in others’ speech involves perceptual systems identifying the processes by which these articulatory gestures are performed (these views include so-called motor theories of speech perception—e.g. Liberman and Mattingly, 1985; Galantucci et al., 2006—and direct realist theories of speech perception—e.g. Fowler and Rosenblum, 1991—see also: Luria, 1966; Alajouanine et al., 1964). On such views, categorising a perceived /ba/ as a /ba/ involves input systems abstracting away from the idiosyncrasies of visual and acoustic stimuli and identifying underlying motor processes involved in the subject’s production of a /ba/ gesture. While the details of such processes remain controversial, much of this controversy concerns the role that this process plays in our subsequent understanding of speech (e.g. Hickok, 2009), and the broad family of views that endorse such a suggestion have enjoyed a surge in popularity following the discovery of mirror neurons in areas like the ventral premotor cortex of primates, and specifically the parieto-frontal action observation action execution circuit (the PFC) (Gallese et al, 1996, p.607; Fadiga and Craighero, 2006, p.489). These are neurons that are activated both by the production and perception of action (Rizollatti and Sinigaglia, 2010) and are active in similar ways during the perception and production of speech (Watkins et al., 2003; Wilson et al., 2004; Wilson and Iacoboni, 2006) suggesting a common coding of action in either case.
As such, it is interesting to note that these neurons are often considered to encode the goals of observed actions. Reasons for thinking this include a series of fMRI studies suggesting that isomorphic mirror neuron activation occurs during subjects’ observation of a televised grasping action, regardless of whether or not this action is performed by a human hand, a robotic hand, or a tool (Peeters et al., 2009). Meanwhile, other studies show the opposite effect—they show that isomorphic movements will be encoded differently by neurons in the PFC when these are directed towards different ends (Ferrari et al., 2005). This suggests that PFC activation is not merely representing mirrored muscle movements but the goals of actions perceived, quite specifically. And, in this way, there is some reason to think that the categorical perception of outcomes to which speech acts are directed is underpinned by mechanisms that encode the goals of observed actions, more generally.
We should, of course, be cautious when moving from findings about the neural underpinnings of a cognitive process to theories about the cognitive architecture of the process itself. Thus, we should not conclude from the fact (if it is a fact) that the categorisation of perceived speech involves neural mechanisms also involved in the encoding of goals in observed action that speech perception and goal ascription are alike in the relevant cognitive respects. That said these findings are suggestive when considered within the context of this wider paper. It is not that certain goal ascriptions must be speech-perception-like in their cognitive underpinnings because these involve common neural mechanisms. Rather, it is that if goal ascriptions and speech perception are underpinned by common neural mechanisms it would be unsurprising to discover that they were alike with regards to their cognitive underpinnings (e.g. their modularity). So, if used tentatively, these findings give further credence to the suggestion that some goal ascriptions are underpinned by input systems akin, or common, to those involved in speech perception.
This suggestion is not a foregone conclusion. The thought that goal ascription requires central cognition is rarely made explicit. But one reason for thinking this might be that human behaviour is hugely variable and context sensitive. As such, it might seem that identifying the goals of others’ behaviour requires similarly flexible and context sensitive reasoning; that identifying the goals of others’ action requires the central cognitive resources involved in understanding something about the ways in which these are brought about with access to information about any of the indefinitely many factors that might affect this behaviour. But, to apply such a line of reasoning across the board is, it seems to me, to underestimate the resources that a speech-perception-like module could have available to it when identifying the goals of certain observed actions. As we saw in §2.2, perceptible subtleties in the kinematics of action, such as wrist velocity and grip aperture, can be used to reliably anticipate the goals of surprisingly complex actions even prior to their realisation. For instance, they can be used to predict whether an arm movement is directed towards picking up an apple to eat, give away or throw (see Becchio et al., 2012, for a review of these and related findings). Likewise, Baldwin and Baird (2001) note the presence of perceptible regularities marking the boundaries between individual intentional actions. This suggests that simple principles, implicit in the operations of an informationally encapsulated module, could be used for the bottom-up identification (and even anticipation) of many goals, without relying on central cognitive resources.
Admittedly, these simple principles are, alone, unable to explain certain goal ascriptions that, plausibly, conform to the pattern of results discussed in §2. For instance, various studies indicate that young infants will inflexibly anticipate the goals of simple geometric shapes’ movements by taking into account information about these shapes’ previous behaviour (e.g. Henrik and Southgate, 2012). This shows that such goal ascriptions cannot rely solely on kinematic cues such as grip aperture and wrist velocity and that they must draw on endogenously stored information about the agent’s previous behaviour and goals of its previous actions. But, it is not problematic (in any obvious way) to think that the goals of these ‘actions’ might also be identified by our hypothesised input module. Why? Because input modules need not rely solely on information processed from the bottom-up. Rather, they may also succumb to top-down effects.
To see this, consider the phonemic restoration effect where a single phoneme of a heard sentence is replaced with a cough or white noise. In such cases, most subjects report hearing the entire sentence, intact—they do not notice the missing phoneme (Warren, 1970). For proponents of modularity, like Fodor, this effect provides reason to think that there are top-down effects involved in the identification and categorisation of phonemes (1983, p.77); that the phone identification system has access to information about the way phonemes are combined and that it utilises this information to constrain its predictions when identifying the missing sound. However, these theorists are careful to distinguish this from the idea that the phone identification system is informationally unencapsulated. This is because the phonemic restoration effect is judgement independent. Subjects tested report hearing the entire sentence (with a cough/white nose ‘in the background’) even when they explicitly know that there is a missing phoneme in it (and even when they reflect on this fact). Consequently, the Fodorian proposes that while speech perception systems have access to information that is specified at the levels of representation they compute—e.g. typical combinations of phonemes—they lack generalized access to what the subject knows or believes—e.g. his or her beliefs about the interlocutor’s beliefs, desires and intentions (ibid.). Thus, while these systems are informationally encapsulated in a way that central systems are not, they are prone to certain top-down effects.
Returning to the goals that infants ascribe to the ‘actions’ of simple geometric shapes it then becomes possible to see how these might be the product of analogous processes. Just as the identification of phonemes is affected by the bottom-up identification of information relevant to phonemes perceived and top-down associations formed between phonemes that have been identified in the past, we can hypothesise that analogous modules involved in the identification of observed goals might draw on both sensory information, processed bottom-up, according to simple principles (e.g. the statistical regularities discussed above) and top-down associations formed between goals and behaviour identified prior. Provided that there is sufficient statistical information to identify the shape’s goals in habituation trials—which is plausible in Henrik and Southgate’s (2012) given that the shape stopped and paused at the target object—subsequent ascriptions could well be the result of modules that are sensitive to associations formed between behaviour and goals perceived prior. Therefore, it remains possible that such goal ascriptions are made by input systems, akin to those involved in speech perception.
Is this suggestion also plausible? One reason for scepticism may concern the neural underpinnings appealed to before; mirror neurons in the PFC that encode the goals of observed and produced actions in a common vocabulary. One might doubt that the goals of a simple geometric shape’s actions could be encoded in such a vocabulary given that we share no obvious motor processes with geometric shapes. After all, I’m a human with arms and legs; something lacking in your average circle. So, how could I mirror the movements of a circle? Is this even possible? One reason to think so is this: mirror neurons appear to involve the translation of goals perceived into one’s own motor vocabulary. For instance, aplasic individuals, born without arms or hands, show the same neural activation patterns in their PFC when they perform grasping actions with their feet as when they observe isomorphic actions, performed by normal humans, with their hands (Gazzola et al., 2007). This suggests that the mirroring process involves the translation of observed actions into a code, common to one’s own motor actions. So, provided that the goal of the observed physical action can be produced by the human observer, there is no obvious reason why it could not be encoded within their PFC by mirroring processes.
Perhaps a deeper concern stems from studies that have been taken to suggest the cognitive penetrability of the PFC, thereby implying significant levels of unencapsulation in the cognitive processes it realises. For instance, Iacoboni et al. (2005) compared the PFC activity of subjects observing grasping actions performed in and out of context; e.g. a subject reaching for a full cup of tea to drink that was situated beside biscuits and a teapot against the PFC activity elicited when a subject reached for an identical, but empty, teacup located on its own. They found that PFC activity was greater during the first condition, thereby suggesting that the process was affected by the subjects’ beliefs about the situation.
There are a number of reasons to reserve judgement on this conclusion, however. Firstly, the fullness of the cup was observable (p.539). As such, it is possible that this was an observable cue, processed by a module bottom-up in performing a goal ascription. Indeed, this is actually what we should expect if, as I have suggested we take seriously, the system draws on kinematic cues, like grip aperture and wrist velocity. Why? Because these kinematic variables must be taken into account relative to the size and shape of the object being grasped (Becchio et al., 2012). Since the milky tea in the cup used in this study was opaque, it effectively changed the shape of the perceived object being reached for. Given that this sort of information would have to be available to the module, as I have envisaged it, and given that full cups are more likely to be drunk from than empty cups, it is possible that this provided further information with which to make the relevant goal ascription. This is particularly plausible given that, as we have seen, input systems do not only utilise incoming sensory information, but are also sensitive to the effects of endogenously stored associations between the pieces of information it computes.
Secondly, contextual effects of the kind observed by Iacoboni and colleagues affect human categorical perception quite generally, despite the fact that categorical perception is just about as plausible a candidate for informational encapsulation as any process. For instance, the categorical perception of an angry facial expression will be encoded as a disgusted facial expression in certain contexts (e.g. when attached to a body holding a rotten fish at arm’s length) independently of the observers’ beliefs about the stimuli (Aviezer et al., 2008). Similarly, it is plausible to suppose that the categorisation of phonemes is sensitive to factors, such as whether or not the observed agent has a pen in her mouth. Given that Iacoboni et al. did not test the effects of subjects’ explicit beliefs on PFC activation, directly, and did not examine the judgement independence of this activation, it is not obvious that their study didn’t simply reveal a contextual effect, typical of input systems, quite generally. Certainly, it does not reveal the PFC to be more prone to cognitive effects than better-understood input processes, like those involved in the categorisation of phonemes, and this is what a study would have to show to undermine the idea that input systems identify goals in the way they identify phonemes.
Tentatively, we can then draw two conclusions from our discussion. Firstly, we should take seriously the idea that some goal ascriptions are underpinned by input systems, akin (or perhaps even common) to those involved in the categorisation of phonemes in perceived speech. One preliminary reason for this is that some goal ascriptions are underpinned by processes that, plausibly, display properties that are distinctive of input systems quite generally (such as those involved in the categorisation of phonemes). A second reason for taking this seriously is that phonemic categorisation is importantly like the goal ascriptions that have been our concern in that it involves the parsing and categorisation of outcomes to which observable speech acts are directed; processes that are, plausibly, underpinned by common mechanisms. Finally, we have considered a number of obvious objections to these suggestions and shown that these can be resisted. Thus, there is considerable reason to think that some goal ascriptions might be performed by modular input systems and no obvious reason to reject this suggestion.
At various points throughout this paper, I have considered ways in which these suggestions could be further adjudicated. If they are to be taken seriously, however, we can draw a second (tentative) conclusion: that humans possess distinct kinds of system that perform goal ascriptions. Since humans can and do reason rationally and abductively about the goals of others actions some of the time it cannot be the case that all goal ascriptions are the result of input systems. So, at best, the style of account I have sought to motivate throughout this paper will only apply to some goal ascriptions and this raises a number of interesting questions. For instance, we can ask what the limits of these distinct kinds of system are, the contexts in which they are/are not recruited and their relationship to one another. These are important questions, but they are questions for another day.
- Alajouanine, T., Lhermitte, F., Ledoux, M., Renaud, D., Vignolo, L.A. (1964) Les composantes phonémiques et sémantiques de la jargonaphasie. Revue Neurologique 110, 5–20.
- Ambrosini, E., Reddy, V., de Looper, A., Constantini, M., Lopez, B., and Sinigaglia, C. (2013). Looking Ahead: Anticipatory Gaze and Motor Ability in Infancy. PLOS one [first published online July, 04, 2013].
- Ansuini, C., Cavallo, A., Koul, A., Jacono, M., Yang, Y., and Becchio, C. (2015). Predicting Object Size from Hand Kinematics: A Temporal Perspective. PLOS One. 10(3).
- Aslin, R.N. (2008). What’s in a look? Developmental Science. 10(1): 48-53.
- Aviezer, H., Hassin, R., Ryan, J., Grady, C., Susskind, J., Anderson, A., Moscovitch, M., & Bentin, S. (2008a). Angry, disgusted or afraid? Studieson the malleability of emotion perception. Psychological Science, 19, 724-732.
- Baron-Cohen, S. (1994). How to build a baby that can read minds: cognitive mechanisms in mindreading. Curr. Psychol. Cogn. 13: 1-40.
- Baldwin, D. and Baird, J. (2001). Discerning intentions in dynamic human action. Trends in Cognitive Sciences, 5(4), pp.171-178.
- Becchio, C., Sartori, L., Bulgheroni, M., and Castiello, U. (2008). Both your intention and mine are reflected in the kinematics of my reach-to-grasp movement. Cognition. 106: 894-912.
- Becchio, C., Sartori, L., Bulgheroni, M., & Castiello, U. (2008a). The case of Dr. Jekyll and Mr. Hyde: A kinematic study on social intention. Consciousness and Cognition, 17, 557–564.
- Bertenthal, B., Proffitt, D. & Cutting, J. (1984) Infants sensitivity to figural coherence in biomechanical motions , 213-230.
- Bertenthal, B. I., Proffitt, D. R., Kramer, S. J. & Spetner, N. B. (1987). Infants’ encoding of kinetic displays varying in relative coherence. , 171- 178.
- Birch, S.A.J and Bloom, P. (2007). The curse of knowledge in reasoning about false beliefs. Psychological Science, 18(5): 382-386.
- Braddon-Mitchell, D. and Jackson, F. (1996). Philosophy of Mind and Cognition. Oxford: Blackwell.
- Browman, C. and Goldstein, L. (1992). Articulatory phonology: an overview. Phonetica, 49(3-4), pp. 155-80
- Butterfill, S. (2007) ‘What are Modules and What is Their Role in Development?’, in Mind and Language, 22 (4), pp.450-473.
- Caggiano V., Fogassi L., Rizzolatti G., Casile A., Giese M. A., Their P. (2012). Mirror neurons encode the subjective value of an observed action. Proc. Natl. Acad. Sci. U.S.A. 109, 11848–11853.
- Candidi, M., Urgesi, C., Ionta, S., & Aglioti, S.M. (2008). Virtual lesion of ventral pre-motor cortex impairs visual perception of biomechanically possible but not impossible actions. Social Neuroscience, 3(3–4), 388–400.
- Carruthers, P. (2006) The Architecture of the Mind, Oxford: Oxford University Press.
- Carruthers, P. (2015). Mindreading in adults: evaluating two-systems views. Synthese, p.1-16 (Online 23rd June 2015).
- Clark, A. (1997). Being There: Putting Brain, Body and World Together Again. Cambridge: MIT Press.
- Clark, A. and Lupyan, G. (2015). Words and the World: Predictive Coding and the Language-Perception-Cognition Interface. Current Directions in Psychological Science, 24(4) 279–284.
- Costantini, M., Ambrosini, E., Cardellicchio, P., & Sinigaglia, C. (2013). How your hand drives my eyes. Social Cognitive and Affective Neuroscience, Advance Access.
- Coulter, K.S. and Coulter, R.A. (2005), “Size Does Matter: The Effects of Magnitude Representation Congruency on Price Perceptions and Purchase Likelihood,” Journal of Consumer Psychology, 15 (1), 64-76.
- Csibra, G. and Gergely, G. (1998). The teleological origins of mentalistic action explanations: a developmental hypothesis. Developmental Science. 1, pp.255–259.
- Csibra, G., Gergely, G., Biro, S., Koos, O., and Brockbank, M. (1999) ‘Goal Attribution Without Agency Cues: The Perception of Pure Reason in Infancy’, in Cognition, 72, pp.237-267.
- Dennett, D. (1987). The Intentional Stance. Cambridge: MIT Press.
- Deroy, O. (2014). ‘Modularity’, in M. Matthen (ed) Oxford Handbook of Philosophy of Perception, Oxford: Oxford University Press.
- Fadiga L, Craighero L. Hand actions and speech representation in Broca’s area. Cortex. 2006;42(4):486–490.
- Fantasia, V., Markova, G., Fasulo, A., Costall, A., & Reddy, V. (2016). Not just being lifted: infants are sensitive to delay during a pick-up routine. Frontiers in Psychology , 6, .
- Ferrari, P., Rozzi, S., and Fogassi, L. (2005). Mirror neurons responding to observation of actions made with tools in the monkey ventral premotor cortex. Journal of Cognitive Neuroscience, 17(2); pp. 212-226.
- Feiman, R., Carey, S., & Cushman, F. A. (2015). Infants’ representations of others’ goals: Representing approach over avoidance. Cognition, 136, 204-214.
- Fodor, J. (1983). The Modularity of Mind. Cambridge: MIT Press.
- Fodor, J. (1989). Psychosemantics: the Problem of Meaning in the Philosophy of Mind. Cambridge: MIT Press.
- Fodor, J. (2000). The Mind Doesn’t Work That Way. Cambridge: MIT Press.
- Fowler, C.A., Rosenblum, L.D., 1991. The perception of phonetic gestures. In: Mattingly, I.G., Studdert-Kennedy, M. (Eds.), Modularity and the Motor Theory of Speech Perception. Lawrence Erlbaum, Hillsdale, NJ, pp. 33–59.
- Galantucci, B., Fowler, C. and Turvey, M. (2006). The motor theory of speech perception reviewed. Psychonomic Bulletin & Review, 13: pp. 361-377.
- Gallese, V., Fadiga, L., Fogassi, L. & Rizzolatti, G. (1996). Action recognition in the premotor cortex. Brain, 119, 593–609.
- Gazzola, V., Rizzolatti, G. Wicker, B. & Keysers, C. (2007). The anthropomorphic brain: The mirror neuron system responds to human and robotic actions. NeuroImage, 35, 1674–1684.
- Csibra, G. (2003) Teleological and referential understanding of action in infancy. Philos. Trans. R Soc. B Biol. Sci. 29, 447–458
- Gergely, G., Nadasdy, Z., Csibra, G., and Biro, S. (1995). Taking the Intentional Stance at 12months of Age. Cognition, 56, pp.165-193.
- Gergely, G. and Csibra, G. (2003). Teleological reasoning in infancy: the naïve theory of rational action. Trends in Cognitive Science, 7(7), pp.287-292.
- Gopnik, A. and Wellman, H. (1992). Why the child’s theory of mind really is a theory. Mind and Language. 7(1): pp.145-171.
- Gopnik, A. and Meltzoff, A. (1997). Words, Thoughts and Theories. Cambridge: MIT Press.
- Henrik, M. and Southgate, V. (2012). Nine-months-old infants do not need to know what the agent prefers in order to reason about its goals: on the role of preference and persistence in infants’ goal-attribution. Developmental Science, 15(5), pp.714-722.
- Hickok, G. (2009). Eight Problems for the Mirror Neuron Theory of Action Understanding in Monkeys and Humans. Journal of Cognitive Neuroscience. 21(7): 1229-1243.
- Howhy, J. (2013) The Predictive Mind. Oxford, OUP.
- Iacoboni, M. (2008). The Role of the Premotor Cortex in Speech Perception: Evidence from fMRI and rTMS. Journal of Physiology, 102: pp.31-34.
- Iacoboni, M., Molnar-Szakacs, I., Gallese, V., Buccino, G., and Mazziotta, J.C. (2005). Grasping the intentions of others with one’s own mirror neuron system. PLOS Biology. 3(3): 529-535.
- Jacob, P. and Jeannerod, M. (2005). The Motor Theory of Social Cognition: a critique. Trends in Cognitive Science, 9: pp.21-25.
- Kahneman, D. (2011). Thinking, fast and slow. London: Penguin.
- Kamewari, K., Kato, M., Kanda, T., Ishiguro, H., & Hiraki, K. (2005). Six-and-a-half-month-old children positively attribute goals to human action and to humanoid-robot motion. Cognitive Development, 20, pp.303–320.
- Lewis, D. (1994). Reduction of Mind. Samuel Guttenplan (ed.), A Companion to Philosophy of Mind, Oxford: Blackwell Publishers, pp. 412–431.
- Liberman, A. and Mattingly, I. (1985). The Motor Theory of Speech Perception Revised. Cognition, 21(1): pp. 1-36.
- Liberman, A. and Whalen, D. (2000). On the Relation of Speech to Language. Trends in Cognitive Sciences. 4(5)pp. 187-96.
- Luria, A.R., 1966. Higher Cortical Functions in Man. Basic Books, New York.
- Mandelbaum, E. (2015). The Automatic and the Ballistic: Modularity beyond perceptual processes. 28(8): 1147-1157.
- Manera, V., Shouten, B., Becchio, C., Bara, B. & Verfaillie, K. (2010) . Inferring intentions from biological motion: A stimulus set of point-light communicative interactions, Behaviour Research Methods, 42,168-178.
- Manera, V., Becchio, C., Schouten, B., Bara, B., and Verfaillie, K. (2011a). Communicative interactions improve visual detection of biological motion. PLoS One 6:e14594.
- Manera, V., Del Giudice, M., Bara, B., Verfaillie, K., and Becchio, C. (2011b). The second-agent effect: communicative gestures increase the likelihood of perceiving a second agent. PLoS One 6:e22650.
- Manera, V., Becchio, C., Cavallo, A., Sartori, L., and Castiello, U. (2011c). Cooperation or competition? Discriminating between social intentions by observing prehensile movements. Experimental Brain Research, 211, pp.547–556.
- Maheswaran, D., Mackie, D., & Chaiken, S. (1992), “Brand Name as a Heuristic Cue: The Effects of Task Importance and Expectancy Confirmation on Consumer Judgments,” Journal of Consumer Psychology, 1(4), 317-336.
- Michael, J., Sandberg, K., Skewes, J., Wolf, T., Blicher, J., Overgaard, M., & Frith, C.D. (2014). Continuous theta-burst stimulation demonstrates a causal role of premotor homunculus in action understanding. Psychological Science, 0956797613520608.
- Mitchell, P., Robinson., E.J., Isaacs., E.J., and Nye, R.M. (1996). Contamination in Reasoning about False Beliefs: an instance of realist bias in adults but not children. Cognition, 59: 1-21.
- Newtson, D (1973). Attribution and the unit of perception of ongoing behavior. Journal of Personality and Social Psychology. Vol. 28(1), 28(1), pp.28-38
- Newtson, D. and Gretchen E. (1976). The Perceptual Organization of Ongoing Behavior. Journal of Experimental Social Psychology, 12(5), pp.436-50
- Newtson, D., Engquist, G. and Bois, J. (1977). The objective basis of behavior units. Journal of Personality and Social Psychology. Vol. 35(12), 35(12), pp.847-862.
- Nichols, S. and Stich, S. (2001) Mindreading. Oxford: OUP.
- Peeters, R., Simone, L., Nelissen, K., Fabbri-Destro, M., Vanduffel, W., Rizzolatti, G. & Orban, G.A. (2009). The representation of tool use in humans and monkeys: common and unique human features. Journal of Neuroscience, 29, 11523–11539.
- Phillips, A., & Wellman, H. (2005). Infants’ understanding of object-directed action. Cognition, 98, pp.137–155.
- Pinker, S. (2005) ‘So How Does the Mind Work?’, in Mind and Language, 20(1), pp.1-24.
- Pobric, G. & Hamilton, A. (2006). Action understanding requires the left inferior frontal cortex. Current Biology, 16(5), 524–529.
- Potter, M. (1975). Meaning in Visual Search. Science, 187, pp.965-966.
- Premack, D. (1990). The Infant’s theory of self-propelled objects. Cognition. 36: 1-16.
- Prinz, J. J. (2006) ‘Is the mind really modular?’, in R. Stainton (ed.), Contemporary Debates in Cognitive Science, pp. 22–36, Oxford: Blackwell.
- Pylyshyn, Z. (2000). Is Vision Continuous with Cognition: the case for cognitive impenetrability of visual perception. Behavioral and Brain Sciences 22 (3):341-365.
- Reddy, V., Markova, G., and Wallot, S. (2013). Anticipatory Adjustments to Being Picked up in Infancy. Plos One, 8(6).
- Reid, V., Belsky, J., & Johnson, M. (2005). Infant perception of human action: Toward a developmental cognitive neuroscience of individual differences. Cognition, Brain, Behavior, 9(2), pp.35–52.
- Rizzolatti, G. & Sinigaglia, C. (2010). The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations. Nature, 11, 264-274.
- Saffran, J., Hauser, M., Seibel, R., Kapfhamer, J., Tsao, F. and Cushman, F. (2008). Grammatical pattern learning by human infants and cotton-top tamarin monkeys. Cognition, 107(2) pp.479-500
- Samuels, R. (2002), Nativism in Cognitive Science. Mind & Language, 17: 233–265.
- Samuel, A. (1981). Phonemic Restoration: insights from a new methodology. Journal of Experimental Psychology: General, 110, pp.474-494.
- Saylor, M., Baldwin, D., Baird, J., and LaBounty, J. (2007). Infants’ On-line Segmentation of Dynamic Human Action. Journal of Cognition and Development, 8(1) pp.113-113
- Scholl, B. J., & Gao, T. (2013). Perceiving animacy and intentionality: Visual processing or higher-level judgment? In M. D. Rutherford & V. A. Kuhlmeier (Eds.), Social perception: Detection and interpretation of animacy, agency, and intention (pp. 197-230). Cambridge, MA: MIT Press.
- Shim, J., Carlton, L., Chow, J., & Chae, W. (2005). The use of anticipatory visual cues by highly skilled tennis players. Journal of Motor Behavior, 37, pp.164-175.
- Soderstrom, M., Kemler, D., Jusczyk, N. and Jusczyk, P. (2005). Six-month-olds recognize clauses embedded in different passages of fluent speech. Infant Behavior and Development, 28(1) pp.87-94
- Southgate, V., Johnson, M., & Csibra, G. (2008). Infants attribute goals even to biomechanically impossible actions. Cognition, 107, pp.1059-1069
- Southgate, V., & Csibra, G. (2009). Inferring the outcome of an ongoing novel action at 13 months. Developmental Psychology, 45, pp.1794–1798.
- Spaulding, S. (forthcoming). On whether we can see intentions? Pacific Philosophical Quarterly. (published online: 19 November, 2015)
- Spelke, E., 1994. Initial knowledge: Six suggestions. Cognition, 50: 435–445.
- Steinpress, R., Anders, K. and Ritzke, D. (1999). The Impact of Gender on the Review of the Curricula Vitae of Job Applicants and Tenure Candidates: A National Empirical Study. Sex Roles, 41(7/8): pp.509-528.
- Stromswold, K., 1999. Cognitive and neural aspects of language acquisition. In E. Lepore and Z. Pylyshyn, eds., What Is Cognitive Science?, Oxford: Blackwell, pp. 356–400.
- Thorndike, A.N. Sonnenberg, L., Riis, J., Barraclough, S. & Levy, D.E. (2012). A 2-Phase Labeling and Choice Architecture Intervention to Improve Healthy Food and Beverage Choices. American Journal of Public Health. 102(3): pp.527–533.
- Tversky, A. and Kahnemann, D. (1974). Judgement under uncertainty: heuristics and biases. Science. 185 (4157) pp.1124-1131.
- Urgesi, C., Candidi, M., Ionta, S., & Aglioti, S.M. (2007). Representation of body identity and body actions in extrastriate body area and ventral premotor cortex. Nature Neuroscience, 10(1), 30–31.
- Watkins KE, Strafella AP, Paus T. Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia. 2003;41:989–994.
- Warren, R. (1970). Perceptual Restoration of Missing Speech Sounds. Science, 167, pp.392-393.
- Wilson SM, Iacoboni M. Neural responses to non-native phonemes varying in producibility: evidence for the sensorimotor nature of speech perception. NeuroImage. 2006;33(1):316–325.
- Wilson SM, Saygin AP, Sereno MI, Iacoboni M. Listening to speech activates motor areas involved in speech production. Nature Neuroscience. 2004;7:701–702.
- Woodward, A. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition. 69 (1) pp.1-34.
- Woodward, A., Sommerville, J., Gerson, S., Henderson, A., & Buresh, J. (2009).‘The emergence of intention attribution in infancy’, in B. Ross (Ed.) The Psychology of Learning and Motivation, Vol. 51 (pp.187-222). Waltham,MA: Academic Press.
- Zacks, J., Tversky, B. and Iyer, G. (2001). Perceiving, remembering, and communicating structure in events. Journal of Experimental Psychology: General. Vol. 130(1), pp.29-58.
 Use of the word ‘goal’ should therefore be distinguished from cases where the label refers to an agent’s pro-attitudes (e.g. Underwood’s desire to be president). A goal, as I am using the term, refers to an outcome, there in the world, rather than a subject’s mental states or drives. Clearly, humans can and do ascribe mental states and drives to others, but this will not be my concern in the present treatment.
 Note that, even if Csibra and Gergely were to retract such a claim under pressure, the point remains: these theorists have not endorsed any alternative proposals in their extensive work on goal ascription. Consequently, if it is true to say that some goal ascriptions do not involve abductive reasoning an alternative story will still need to be found to account for these.
 This leaves open the possibility that input systems are not as encapsulated as Fodor suggests. Even if Fodor is wrong to suggest that input systems are entirely unaffected by information located outside the system (e.g. the subject’s beliefs) it is, I take it, undeniable that input systems are relatively unaffected by information located outside the system when compared with the processes involved in rational abduction and thought. For instance, proponents of predictive coding sometimes hold that any of a subject’s beliefs and expectations can affect perceptual processing (Clark, 2013; Howhy, 2013). However, in order to accommodate the apparent judgement independence of visual illusions, etc., these theorists posit that effects are Bayes optimised over a large amount of time (Clark and Lupyan, 2015). This is quite unlike rational thought, where effects can be more or less immediate. As a result, it remains true to say that one’s salient and occurrent beliefs do not have the immediate effects on perceptual processing that they appear to have on belief fixation and rational judgement and that this difference—which I will continue to call ‘informational encapsulation’—is telling.
 Incidentally, automaticity is another property that Fodor takes to be distinctive of modular processes. This is controversial, however; hence why I have avoided placing weight on it here. For instance, Carruthers (2006) suggests that all cognitive processes are automatic in some sense (but see Mandelbaum, 2015).
 Other properties of input modules that Carruthers takes to be uncharacteristic of central systems include significant innateness and shallow outputs (2006, p.12). I take it to be an open question whether systems performing simple goal ascriptions possess these properties in addition to (1), (2) and (3). There certainly seems to be no obvious reason why goal ascription would need to involve particularly deep outputs, but offering a more definitive answer is difficult, in part, due to controversies concerning quite what shallowness is (compare Fodor, 1983, p.87, with Butterfill, 2007, p.462-468). Similar concerns may arise with regards to the innate development of goal reasoning in infants (Samuels, 2002), however there is at least suggestive evidence that goal reasoning develops according to a pattern (Woodward, 2009). In order to assess this further, cross-cultural studies would need to be run to determine if this pattern of development is universal in the way that the development of speech perception (Stromswold, 1999) or visual processing (Spelke, 1994) is thought to be.
 Peter Carruthers has endorsed such an argument in response to dual system accounts of belief reasoning; see his 2015.
 Jacob and Jeannerod (2005) and Spaulding (forthcoming) appear to make suggestions of this sort.
 This point applies to other studies that may be taken to suggest top-down influences on PFC activity: e.g. Cagiano et al. (2012).