Featured Video Play Icon

Goal Ascription for the A-rational

Sam Clarke (University of Oxford)

[Jump to comments]


Abstract: This paper advances three claims concerning the cognitive processes that underpin human goal ascriptions. First, I propose that many of our leading theories of goal ascription hold, or seem committed to holding, that the goals of others’ actions can only be identified through a process of approximately rational, abductive reasoning (§1). Second, I argue that there is reason to question this commitment. Some goals appear to be identified by fast, inaccessible and informationally encapsulated cognitive processes. This suggests that they are identified by input systems—akin to those involved in speech and sensory perception—rather than the central systems that rational abduction paradigmatically involves (§2). Third, I suggest that there are independent reasons to take this latter proposal seriously and no obvious reasons to reject it (§3). This presents a challenge to the existing views of goal ascription discussed in §1 and raises a number of important questions for future research.

Keywords: goal ascription, modularity, abduction, teleological stance, theory of mind

Humans ascribe goals to observed actions. By this, I mean to say that humans can and do identify the outcomes to which others’ observed actions are directed.[1] And this is a good thing. Infants come to identify the goals of others’ observed actions before they come to identify and fully understand the mental states of others. Thus, it is often said that goal ascriptions facilitate many of our earliest social interactions (Csibra and Gergely, 1998). Indeed, it is sometimes said that these early goal ascriptions bootstrap the subsequent development of a mature theory of mind, as found in adult humans (Gergely and Csibra, 2003), and that they underpin much cognitive, linguistic and social development (Woodward et al., 2009). For these reasons, it is important that we gain a proper understanding of the cognitive processes upon which these goal ascriptions depend.


In this paper, I aim to make a small contribution towards this large project by introducing and developing three claims. First, I observe that many of our leading theories of goal ascription hold, or seem committed to holding, that the goals of others’ actions can only be identified through a process of approximately rational, abductive reasoning (§1). Second, I argue that there is reason to question this commitment. Some goals appear to be identified by fast, inaccessible and informationally encapsulated cognitive processes. This suggests that they are identified by input systems—akin to those involved in speech and sensory perception—rather than the central systems that rational abduction paradigmatically involves (§2). Third, I suggest that there are independent reasons to take this latter proposal seriously and no obvious reasons to reject it (§3). This presents a challenge to the existing views of goal ascription discussed in §1 and raises a number of important questions for future research.


1. A Popular Stance

At one stage, it was common to stress the importance of mental state ascriptions in the everyday prediction and explanation of human behaviour. For instance, Daniel Dennett (1987) famously proposed that the “only” (p.21) way by which normal humans, going about their day-to-day lives, could predict the behaviour of other humans, was by adopting the intentional stance: by assuming others to be rational agents striving to fulfil their desires in the light of the beliefs we take them to hold about the world. Relatedly, Dennett suggested that our only practical means of explaining such behaviour would be by inverting this process and ascribing beliefs/desires to the agent in order to rationalise and make sense of it.

Much recent work on goal ascription downplays this emphasis on mental state ascription. But, it is not hard to see the influence that proposals, like Dennett’s, continue to have on those theorising in this area. Gyögy Gergely and Gergely Csibra (2003) are an obvious case in point. They deny that mental state ascriptions are necessary for the reliable prediction and explanation of certain human behaviour. But, they say this because the inferences that Dennett claims to underpin the prediction and explanation of behaviour need not be applied to agents’ mental states, themselves, but can, instead, be applied to the contents of these. For example, they propose that we might predict the way that Bob would reach for the biscuit, on the assumption that this behaviour is being rationally directed towards the goal of an eaten biscuit, given the constraints of reality imposed upon this behaviour (e.g. obstacles in the way), without this goal or these constraints of reality being represented as the contents of Bob’s beliefs and desires. Similarly, it is claimed that the goal of Bob’s action can be inferred by working backwards from the behaviour; by assuming that this is being, or has been, pursued rationally by Bob given the constraints of reality. In this way, these theorists downplay the importance of full-blown mindreading in action interpretation, but they hold onto the core of Dennett’s suggestion, proposing that the “inferential principle” of this, so called, teleological strategy “is identical to the rationality principle of the mentalistic stance” which Dennett defends (Gergely and Csibra, 2003, p.287; see also Gergely et al., 1995, and Southgate et al., 2008).

For this reason, Gergely and Csibra’s account can be seen to rely on abductive reasoning. This much is widely acknowledged when it comes to the mental state reasoning that Dennett describes (Braddon-Mitchell and Jackson, 1996; Gopnik and Meltzoff, 1997; Gopnik and Wellman, 1992; Lewis, 1994; Nichols and Stich, 2001). Here, no finite number of beliefs and/or desires possessed by a subject will rationally necessitate that they act in any given way. For instance, Bob’s belief that there are biscuits in the tin coupled with his desire for biscuits will only cause him to rationally produce (and should only lead us to rationally predict) tin opening behaviour from Bob if all else remains equal; further beliefs and desires could always rationally alter his behaviour and, thus, the predictions we would otherwise rationally make about his behaviour. So, short of grasping Bob’s mental life in its entirety as Bob’s mental life in its entirety—something that is beyond us as finite human observers—reliably predicting Bob’s behaviour via Dennett’s intentional strategy involves us identifying the most likely behaviour that Bob will perform from a fragmentary and defeasible evidential basis; it involves us making an inference to the best explanation.

Similar points apply to Gergely and Csibra’s teleological strategy. Just as further belief/desire ascriptions could always alter the behaviour one should predict of a rational agent on Dennett’s story, further information about the constraints that reality places on action could always alter the behaviour we should predict of an agent when adopting the teleological stance. For instance, if we identify Bob’s behaviour as directed towards the goal of eaten biscuits we might predict him to reach into the conveniently placed tin containing biscuits, but, perhaps not, if the tin is locked (unless there is a large hole in its rear, etc.). Since further information of this sort could always alter the behaviour it would otherwise be reasonable to predict of a rational agent when adopting the teleological strategy, the application of such a strategy requires us to predict behaviour on the basis of a fragmentary and defeasible evidential basis in the same way as on Dennett’s account; it involves us reasoning abductively about the behaviour in question, flexibly taking into account any salient information available to us about the world and the goals to which the target behaviour is directed in the same way that Dennett’s intentional strategy involves us flexibly taking into account any relevant belief/desire ascriptions.

The same holds for goal ascriptions on Gergely and Csibra’s account. Here, the constraints of reality and the agent’s movements underdetermine the goal to which these movements are rationally directed. When Bob reaches and grasps the biscuits in the tin, say, we can ask whether the goal to which his action was rationally directed was to have grasped the biscuits, to have grasped whatever was in the box, or some other end with no obvious relation to biscuit grabbing (perhaps this was an accident)? In any case, the answer is underdetermined by the mere fact that Bob has grasped the biscuits and could always be affected by further information available to us about Bob’s movements and the constraints that reality imposes on these. Thus, its identification involves us making an inference to the best explanation that, at least, approximates a rational sensitivity to any salient and accessible information about Bob and the world he inhabits.

Is this true of all goal ascriptions? Gergely and Csibra seem to think so. Their suggestion seems to be that while adult humans can adopt a full-blown mentalistic stance when predicting and explaining behaviour, infants lack the necessary mindreading abilities and therefore have no choice but to adopt the teleological stance when predicting others’ behaviour and identifying the goals of their actions. As they put it:

to interpret such an event as a goal-directed action infants must establish a specific explanatory relation among three elements: the action, the goal state, and the constraints of physical reality. (Csibra and Gergely, 1998, p.255, my emphasis)

In this way, Gergely and Csibra suggest that if humans are to succeed in making accurate goal ascriptions they have no option but to adopt and utilise a rationalistic stance (either mentalistic, teleological, or perhaps some mixture of the two). Since adopting such a stance involves abductive reasoning, it would then seem that, according to Gergely and Csibra, humans must engage in abductive reasoning if they are to identify the goals of others’ actions.[2]

This is a bold claim, but it is not uncommon. Gergely and Csibra are often seen to provide the single best account of goal ascription currently available. But, even among theorists who remain agnostic or critical of the details or import of the account, we often find strong hints that goal ascription is taken to require some form of abductive reasoning very much like that which Gergely and Csibra propose. For instance, Amanda Woodward—another leading authority on goal ascription—frames her discussions in terms of subjects’ capacity for “reasoning” (1998, p.1) about the goals of observed actions, flexibly taking into account things they believe about agents, actions and the environment (see also Phillips and Wellman, 2005). Indeed, like Csibra and Gergely, Woodward cites Dennett when claiming that it is this early capacity for ‘reasoning’ about the goals of others that facilitates the subsequent development of full-blown mental-state reasoning (p.2). This suggests that the picture of goal ascription and development that she has in mind is similar to that developed by Gergely and Csibra and that it is underpinned by similar cognitive processes.

None of this is to suggest that there is absolutely no resistance to this general trend. Some have held that certain goals are ascribed via simple, observable cues, rather than processes of full-blown, rational abduction—cues like self-propulsion (Premack, 1990) and direction of gaze or movement (Baron-Cohen, 1994). This is a view that, as we shall see, I have some sympathy with. But it is often seen to be problematic (Csibra et al., 1999; Gergely, 2002). Moreover, such accounts have sometimes been motivated by the thought that early goal ascriptions should be interpreted in richer terms than theorists like Gergely and Csibra advise; that they reflect the fast and dirty heuristics of systems capable of full-blown, rational and abductive, mental state reasoning. Consequently, the assumption that identifying the goals of others’ actions requires some kind of rational abduction—or at least implicates systems capable of rational abduction—seems to be widespread, and deeply entrenched, in much contemporary theorising about goal ascription. It is this assumption that I want to question.

2. Troubles With the Popular Stance

The proposal that all goal ascriptions rely on rational abduction—or, at least systems capable of rational abduction (as many seem to suggest)—is empirically tractable. To see this, consider Fodor’s infamous distinction between the mind’s modular and non-modular parts (1983). For Fodor, abduction of the above kind depends upon non-modular, or central, cognitive systems (1983; 2000). This is because rational abduction is a paradigmatically unencapsulated process that is (or can be) directly affected by any of one’s beliefs, at least in principle—e.g. anything one believes about the constraints of reality and more. For Fodor, modular input systems are not unencapsulated in this way—rather, they are largely unaffected by what their subject knows/believes (about, say, others’ mental states or the relevant constraints of reality) even when these beliefs are salient and reflected upon. Thus, within a Fodorian framework, accounts that take all goal ascriptions to involve rational abduction (or systems capable of rational abduction) seem committed to holding that all goal ascriptions depend upon non-modular central processes. This is noteworthy, since non-modular central systems are said to display a number of properties to a striking extent, when compared with modular input systems. Thus, if it is true to say that all goal ascriptions depend upon abductive reasoning (and, thus, central resources) the Fodorian should expect the processes involved to display these properties.

I happen to endorse an essentially Fodorian picture of the above sort. That being said, it is important to acknowledge that the details of Fodor’s purported distinction between modular and non-modular systems are controversial, if only to note that much of this controversy is irrelevant for our purposes. One apparent challenge comes from the Massive Modularity Hypothesis. Proponents of Massive Modularity have argued that the mind is entirely (or almost entirely) composed of modular systems, even at its most central parts. This may appear to undermine the Fodorian distinction just introduced. However, in reality, proponents of Massive Modularity are often careful to note that when they describe central systems as ‘modular’ they do not mean to suggest that they are ‘modular’ in the sense that Fodorian input modules are (Carruthers, 2006, p.12; Pinker, 2005). Indeed, these theorists will typically remain open to an essentially Fodorian distinction between input and central systems and the existence of distinctive properties manifested by each (Deroy, 2014). In this way, they simply change the subject (Prinz, 2006).

Perhaps more troublingly, various theorists have denied that any cognitive systems qualify as modular in Fodor’s strict sense. But, even these theorists are happy to appeal to the properties that Fodor identifies as indicative of a modular/non-modular process as indicative of a process being relatively perceptual or relatively cognitive (for example, compare Briscoe 2010 and 2015). This is because, while these theorists deny that input systems possess the properties that Fodor takes to be indicative of modularity, to the extent that Fodor proposes, they will typically acknowledge a difference in the degree to which abductive reasoning and rational thought manifest the properties that Fodor takes to be indicative of modularity when compared with the operations of input systems involved in speech and sensory perception (e.g. Clark and Lupyan, 2015). So, if we were to find that certain goal ascriptions displayed the properties that Fodor takes to be indicative of modularity, to the extent that input systems manifest these properties, then this would still provide good evidence against the suggestion that all goal ascriptions are the product of rational abduction.

With this in mind, I will now introduce three reasons for thinking that the processes underpinning certain simple goal ascriptions do not possess the properties we would expect of them if they were the result of rational abduction. This will not be to prove that rational abduction is not involved and, in each case, I will try to show how proponents of rational abduction’s indispensability for goal ascription might respond. That said, when these reasons are considered collectively, I will tentatively propose that some goal ascriptions look to be the achievement of input systems, akin to those involved in low-level speech and sensory perception; a conjecture that I will suggest we have independent reason to take seriously in §3.

2.1 Speed

One reason to question the idea that rational abduction (and, therefore, central processing) is involved in all cases of goal ascription concerns the fact that humans, and even young infants, are apparently able to identify and react to the goals of certain observed actions very quickly. For instance, a recent study by Reddy et al. (2013) measured three-month old infants’ postural changes in response to a caretaker’s actions directed at picking them up to hold. It found that selective shifts in the infants’ posture (e.g. the straightening or stiffening of the legs and widening or raising of the arms in response to the caretaker’s behaviour) were evident “immediately after the onset of (the caretaker’s) approach” (p.1) or, perhaps on a more cautious assessment, within the first 100ms of approach onset (p.5).

This is amazingly fast. Reacting to the caretakers’ behaviour in these cases would seem to necessitate the prior perceptual recognition of the caretaker and their movements. But, often, perceptual input systems will only produce their outputs within a comparable timeframe. For instance, by Potter’s (1975) estimate, the identification of phonemes in others’ speech—among the fastest of all cognitive processes (Fodor, 1983, p.61)—takes between 125-167ms. Thus, if this is correct, there simply seems to have been no time for the infants to have reacted to the caretaker’s movements on the basis of post-perceptual reasoning: i.e. for their reactions to have involved perceptual identification of the caretaker and their movements plus further cognitive work (e.g. rational inferences carried out by post-perceptual, central systems).

It is then striking that the infants in this study were plausibly tracking and responding to the goals of their caretakers’ actions. This is evinced by the fact that the infants in this study were selectively responding, in appropriate ways, to actions with a certain goal (holding me), despite apparent variability in the kinematic structure of actions directed towards this end (see p.4 of target article and Fantasia et al., 2016). This provides evidence that the infants, in the above study, were tracking and responding to a goal of their caretaker’s actions, within the above timeframe, rather than some more local feature (or features) of the caretaker’s behaviour. And, if correct, this provides suggestive evidence that the infants’ timely behavioural responses were genuinely the result of a goal ascription, carried out by perceptual input systems, themselves, as opposed to those central systems upon which rational abduction depends.

Of course, we should avoid placing too much weight on the findings of a single experiment. However, similar results can also be found with more complex goal ascriptions too. For instance, Shim et al. (2005) found that experienced tennis players will react to the goal trajectory of opponents’ serves within 127ms of movement onset. And, similarly, Ambrosini et al. (2013) found that humans anticipate the goals of simple reaching actions by taking into account kinematic variables, such as grip aperture, from 6 months of age. This was implied by subjects’ anticipatory eye movements towards a target object, as much as 800ms prior (p.5) to the completion of observed reaching actions taking between 1720-2280ms (p.3); a significant finding since further studies carried out by the same group have suggested that much of the relevant kinematic information being utilised in these studies is not available until 60% of movement time has elapsed (Ansuini et al., 2015, p.8, p.11). Thus, it is plausible that some subjects were performing their anticipatory looks within 112ms of observing the kinematic cues utilised in this study.

Admittedly, the interpretation of all these results is complicated (perhaps subjects guessed before all of the information was in) and involves divining the difference between the representation of goals in others’ actions and our mere sensitivity to more local cues in their behaviour (a point where critics might wish to resist the above suggestions) or to actions of a certain kind, irrespective of any goal ascriptions. Nevertheless, the above studies are at least suggestive; taken at face value they suggest that certain goal ascriptions are performed faster than we would expect if they were the result of the post-perceptual, central processes that are paradigmatically involved in rational abduction.

2.2 Accessibility

A second reason to doubt that all goal ascriptions involve rational abduction pertains to the apparent inaccessibility of the information and/or processes involved in some such ascriptions. As Fodor observes, the cognitive states that function as inputs or outputs of rational thought are typically accessible for central monitoring by their subject, in a way that the information utilised by modular input systems is not. For instance, I am unable to introspectively access the information utilised by my early visual systems. But, by contrast, I can typically identify the beliefs and desires that rationally guide my abductive reasoning, at least under the right conditions (Fodor, 1983). Consequently, if/when goal ascriptions are the result of abductive inferences, carried out by central systems, we might expect the information utilised to be relatively accessible for central monitoring.

There is reason to doubt that such information is always accessible in this way. To see this, note that information about the kinematics of action—e.g. subtleties in wrist velocity and grip aperture during reaching—reliably covaries with the goals of surprisingly complex actions; e.g. whether an agent is going to pick up an object to eat, throw, or give away (see Becchio et al., 2012). Moreover, note that various studies have suggested that humans actually use such kinematic information to identify the goals of others’ actions, at least under certain circumstances (see Manera et al., 2010; 2011a; 2011b; 2011c). Such information is, plausibly, not accessible for central monitoring. Admittedly, this has not been formally tested. Nevertheless, the evidence is, I think, suggestive.

For a start, experimenters consistently report that adult subjects tested in these studies appear to be unaware of their sensitivity to the aforementioned kinematic cues, even during debriefings conducted immediately after tests demonstrating their sensitivity to them and their capacity to use these in making goal ascriptions. That is to say, these subjects appear to have been both unaware of how they were identifying the goals of actions observed in the studies they participated in and unaware that the kinematic cues they utilised were even available to be utilised (C. Becchio, pers. comm.).

In and of itself, this lack of awareness does not show that such information was inaccessible for central monitoring. After all, much of our day-to-day decision-making appears to be influenced by factors that are beyond our ken. For instance, there is a wealth of evidence indicating that even educated, liberal hiring committees are prone to various biases (e.g. Steinpreis et al., 1999). Such findings can surprise and scare, but while hiring committees may be unaware, and even shocked to discover, that factors, such as the gender of an applicant, affect their decisions about which staff to hire, this information is not inaccessible to them. Typically, they are, or can be made, aware of a given applicants’ gender and—in the knowledge that such information unfairly affects their decision-making routines—they can choose to rethink their snap judgements. But I think there is at least anecdotal evidence to think that this is not so when it comes to the kinematic cues that Becchio, Manera and others reveal to underwrite certain goal ascriptions. It is this: I’m someone who spends an inordinate amount of time in a busy café on Cowley Road, thinking about the aforementioned kinematic cues. But even in the knowledge that humans grip aperture is smaller and peak grip closing velocity slower when my fellow café goers grasp mugs to pass them to a barista than when they grasp these with the goal of relocating them at a table (Becchio et al., 2008), etc., this is something I seem unable to identify in others’ actions despite repeated attempts. Admittedly, this is far from laboratory conditions. Nevertheless, it is suggestive that such cues may be inaccessible to me—a non-autistic, enculturated subject who apparently utilises such cues when ascribing goals to certain observed actions. And this, in turn, is at least suggestive that these cues—which are used to make certain goal ascriptions—may be processed by systems that are not engaged in rational abduction, e.g. input systems, like those involved in speech and sensory perception.

2.3 Encapsulation

A third reason to question the idea that abductive reasoning underwrites all human goal ascriptions concerns the apparent levels of informational encapsulation that the processes underpinning certain goal ascriptions plausibly display. This is suggestive since, as has been mentioned, striking levels of informational unencapsulation are often taken to be paradigmatic of the central processes that make rational abduction possible. As Fodor (1983; 2000) reminds us throughout his work, the conclusions of one’s abductive inferences can be affected in arbitrarily complex ways by any salient proposition(s) that the subject believes, at least in principle. This shows that central resources must have access to everything that the subject believes, at least in principle, and makes these central processes quite unlike those of modular input systems and the like that are taken to be largely insensitive to much that their subject believes at any given moment.[3] Consequently, if there were to be evidence that the processes underpinning certain goal ascriptions were systematically insensitive to much that their subject believes in the way that systems involved in speech and sensory perception are, then this would speak against the hypothesis that all goal ascriptions are the product of rational abduction.

Plausibly, some such evidence exists. In a study conducted by Southgate, Johnson and Csibra (2008), 6-8-month-old infants were habituated to a variety of well-formed reach-and-grasp actions that required the agent to first move a box out of the way. These were all directed towards a common object and the well-formed nature of these goal directed actions led the infants to identify the goal of these actions as contact with the common object. Infants were then tested on one of two conditions. In the first condition, infants were shown a reach-and-grasp action that was similar to those that they had been habituated to except that it required the agent to first move a further box out of the way in order to reach the target object. In the second condition, reaching and grasping occurred in the same situation as the first, but here the agent neglected to move any obstacles out of the way and instead performed a biomechanically impossible snaking movement to reach the target.

Looking times suggested that infants were more surprised by the first condition. This was taken by the authors of the study to support the teleological stance hypothesis because it suggested that the infants only cared about how well-formed the test action was given its apparent goal and the constraints of reality—in their terms, it suggested that the infants only cared how ‘rationally’ or ‘efficiently’ the action was performed given the constraints of external reality. This is because, apparently, they did not consider how they would themselves perform the action (pace Woodward, 1998). Nor did they take into account the (albeit, limited) knowledge they would have apparently had about the biomechanical constraints on other humans’ actions (see Berenthal et al., 1984) and, in particular, limb movements (Berenthal et al., 1987). But this is a puzzling finding if the infants were abductively reasoning about the agent’s behaviour. Central resources, involved in even approximately rational abduction may well be subject to biases and heuristics that govern the kind of information that gets considered when making snap decisions (Tversky and Kahneman, 1974). But typically, these biases and heuristics cause subjects to place undue emphasis on the salient features of a stimulus when reasoning about it (Maheswaran, Mackie, & Chaiken, 1992; Coulter & Coulter, 2005; Thorndike, Sonnenberg, Riis, Barraclough, & Levy, 2012; Mitchell et al., 1996; Birch and Bloom, 2007). This makes these biases quite unlike the heuristics governing infants’ disregard for biomechanics in the above study since, here, infants were disregarding a highly salient and unfamiliar feature of the stimulus (the unnatural bending of the forearm); a finding made all the more surprising by the fact that, as we have already seen, infants are, at this age, able to use far less salient kinematic information—such as subtleties in grip aperture that are, plausibly, inaccessible to the subject —when ascribing goals to actions (see Ambrosini et al., 2013). Tentatively, I would then like to propose that—when taken at face value—this study plausibly suggests interesting levels of informational encapsulation in the processes responsible for certain goal ascriptions. [Further studies suggesting that infants process only certain kinds of information (regardless of its apparent salience) when reasoning about the goals of observed actions include: Gergely et al. (1995), Kamewari et al. (2005), Phillips and Wellman (2005), Csibra (2008), Southgate and Csibra (2009), Henrik and Southgate (2012), and Feiman, Carey and Cushman (2015).]

Admittedly, this example is not perfect. One might have methodological concerns about the looking time paradigm employed (see Aslin, 2008). Moreover, the belief-independence I have suggested this study to reveal is only evidenced indirectly. In this respect, it is unlike the classic illustrations of informational encapsulation employed by Fodor (1983) and Pylyshyn (2000) that involve subjects explicitly reflecting on their beliefs about a stimuli that are in conflict with its appearance; e.g. cases where the lines of the Muller-Lyer illusion continue to look different lengths even when subjects know and reflect on the fact that this is not so. But, studies could be run to test the belief-independence of goal ascriptions in much the same way. One way in which this could be done would be by exploiting the findings of existing studies that suggest the automaticity of certain goal ascriptions (e.g. Scholl and Gao, 2013).[4] While such studies have typically demonstrated that the effects of goal ascriptions on behaviour are apparent, even when these are irrelevant and detrimental to the subject’s current task, studies could be run to test the effects of goal ascriptions on behaviour even when the subject’s explicit beliefs contradict these ascriptions. For instance, where explicit knowledge about the action’s goal contradicts that suggested by kinematic cues, like wrist velocity and grip aperture, as discussed in the previous subsection. If the effects of such (mistaken) goal ascriptions were evident on subjects’ behaviour, even when the subject explicitly reflects on her true and conflicting beliefs about an agent’s goals, this would provide more direct evidence of the belief-independence of certain ascriptions. In the meantime, however, I will content myself with the suggestion that some studies may suggest the encapsulation of certain simple goal ascriptions, that this is surprising if these ascriptions depend on abductive reasoning, and with the fact that future empirical work could be used to assess matters further.

2.4 A Tentative Suggestion

I have now introduced three reasons to question the idea that all goal ascriptions are the result of rational and abductive reasoning. I have suggested that we take seriously the idea that some goal ascriptions are:

  1. made within a similar time-frame to perceptual input processes
  2. driven by inaccessible cues
  3. the result of encapsulated processes

These are properties that are uncharacteristic of the central systems that rational abduction is seen to involve, even among proponents of Massive Modularity (Carruthers, 2006, p.12) and even among critics of modularity more generally (e.g. Clark and Lupyan, 2015).[5] Thus, to the extent that (1), (2) and (3) are plausible, the idea that rational abduction is responsible for all goal ascriptions should be called into question.

Admittedly, I have noted that there are ways in which one could resist (1), (2) and (3). But, I take it that the considerations discussed remain suggestive. From the perspective of a neutral onlooker, the above findings do not naturally look to be a product of central systems, performing abductive inferences. Instead, they look to indicate the workings of input systems, akin to those involved in speech and sensory perception. Why? Because findings of the sort discussed throughout this section would not only be accommodated (via ad hoc auxiliary hypotheses), but actually predicted on a view which deemed the goal ascriptions under consideration to be the product of such systems. This is because input systems of this sort are widely noted to be fast, inaccessible and encapsulated in much the same way—a fact that, arguably, requires an explanation by appeal to the kind of system they are (Butterfill, 2007). So, while tentative, the most natural thing to say in the light of (1), (2) and (3) is, I think, that some goal ascriptions look to be made by input systems, akin to those involved in speech and sensory perception.

3. Independent Motivations

Not everyone will be convinced. As we saw in §1, a significant number of theorists hold that all goal ascriptions are the result of rational abduction and, therefore, cannot plausibly be carried out by the input systems that are characteristic of speech and sensory perception. Indeed, I can foresee critics claiming that parsimony favours such one-size-fits-all approaches to goal ascription; that since humans can and do sometimes reason abductively about the goals of certain actions, it would be more parsimonious to suppose that all goal ascriptions are underpinned by the systems these inferences involve and that we should, therefore, seek to accommodate (1), (2) and (3) within such a framework.[6] I will now provide reasons to resist such an argument. There are, I propose, general considerations that speak in favour of the idea that some goal ascriptions will be made by input systems, akin to those involved in speech and sensory perception. Consequently, the kind of tentative findings made in §2 are not wild and outlandish, but highly plausible and deserve to be taken seriously.

To begin to see this, note that action understanding and speech comprehension are underpinned by similar processes—something that is, perhaps, unsurprising given that speech just is one kind of action (Brownman and Goldstein, 1992; Clark, 1997; Liberman and Whalen, 2000). As is widely agreed, speech comprehension involves perceptual systems parsing relevant sensory inputs into useful chunks, suitable for semantic analysis—e.g. discrete phonetic units and, from these, words (Saffran et al., 2008) and clauses (Soderstrom et al., 2005)—and similar points apply to the comprehension of observed action. Here too, observers must first parse observed behaviour, and recognise individual actions as bounded units, in order to identify goals in this behaviour and to provide rationalising explanations for it (Baldwin and Baird, 2001). And there is reason to think that this parsing is underwritten by processes that are analogous to those which underwrite the parsing of speech (Newtson et al., 1977). As with speech perception, the processes involved in action parsing operate independently of any semantic knowledge possessed by their subject (Samuel, 1981; Saylor et al., 2007), and it has been suggested that domain specific, generative knowledge must be at work in either case so as to enable observers to parse and identify novel words and actions (Baldwin and Baird, 2001, p.176). Similarly, in both cases, the systems involved utilise perceptible cues, like gaze direction, body posture, gestures and the like (ibid. p.173). These considerations point to often-overlooked similarities between the input systems involved in speech perception and the processes that are involved in the interpretation of observed physical behaviour.

There is, of course, a question of just how similar the processes are. But, given that close similarities have been suggested, it is natural to consider what we might learn about one case from the other. This is pertinent for our purposes since the systems involved in speech perception manifestly perform something importantly similar to the kinds of goal ascription we have been discussing throughout this paper; they operate to identify the outcomes to which others’ speech acts are directed by abstracting away from idiosyncrasies in the ways that these outcomes are brought about (e.g. individual differences in movements produced by the speaker’s vocal tract and mouth, and differences in their accent, pitch and timbre). For instance, when a speaker intentionally produces a noise registered by observers as belonging to the phone class /ba/, the realisation of the relevant noise is an outcome distinct from the behaviour leading up to the realisation of this or the speaker’s intention; it is a kind of goal. In this way, the phonemes that input systems identify and categorise can be thought of as a kind of goal, contributing to wider goals, like the production of words, which contribute to the realisation of bigger goals still, like the production of clauses.

Might something similar be true of the systems involved in the parsing of observed action, more generally? There is some reason to think so. While physical actions are parsed at different scales, they are parsed at goal boundaries, specifically. For instance, when Bob intentionally reaches, grabs, and eats his biscuit he realises a relatively large-scale goal (an ingested biscuit) by realising various sub-goals (e.g. contact with the biscuit, biscuit located in mouth, etc.). This much seems to be reflected in the operations of systems involved in parsing observed action. For instance, action parsing operates at various scales; identifying sub-actions at intention boundaries, and wider actions that these contribute towards, again at intention boundaries (Newtson, 1973; Newtson et al., 1976; Zacks and Tversky, 2001). This, itself, suggests that these systems are sensitive to the goals of observed actions—i.e. when the outcomes to which intentional actions are realised. But, depending on how closely these systems are to be modelled on analogous systems involved in speech perception, it is possible that this is all that they care about. Just as speech perception involves input systems abstracting away from idiosyncrasies in the production of speech, and categorising the phonemes that the speaker intends to produce, systems involved in action parsing, more generally, might abstract away from idiosyncrasies in the realisation of goals and simply function to identify and categorise these.

One reason for taking this latter suggestion seriously is that, according to various theories of speech perception, the categorisation of phonemes and allophones in others’ speech involves perceptual systems identifying the processes by which these articulatory gestures are performed (these views include so-called motor theories of speech perception—e.g. Liberman and Mattingly, 1985; Galantucci et al., 2006—and direct realist theories of speech perception—e.g. Fowler and Rosenblum, 1991—see also: Luria, 1966; Alajouanine et al., 1964). On such views, categorising a perceived /ba/ as a /ba/ involves input systems abstracting away from the idiosyncrasies of visual and acoustic stimuli and identifying underlying motor processes involved in the subject’s production of a /ba/ gesture. While the details of such processes remain controversial, much of this controversy concerns the role that this process plays in our subsequent understanding of speech (e.g. Hickok, 2009), and the broad family of views that endorse such a suggestion have enjoyed a surge in popularity following the discovery of mirror neurons in areas like the ventral premotor cortex of primates, and specifically the parieto-frontal action observation action execution circuit (the PFC) (Gallese et al, 1996, p.607; Fadiga and Craighero, 2006, p.489). These are neurons that are activated both by the production and perception of action (Rizollatti and Sinigaglia, 2010) and are active in similar ways during the perception and production of speech (Watkins et al., 2003; Wilson et al., 2004; Wilson and Iacoboni, 2006) suggesting a common coding of action in either case.

As such, it is interesting to note that these neurons are often considered to encode the goals of observed actions. Reasons for thinking this include a series of fMRI studies suggesting that isomorphic mirror neuron activation occurs during subjects’ observation of a televised grasping action, regardless of whether or not this action is performed by a human hand, a robotic hand, or a tool (Peeters et al., 2009). Meanwhile, other studies show the opposite effect—they show that isomorphic movements will be encoded differently by neurons in the PFC when these are directed towards different ends (Ferrari et al., 2005). This suggests that PFC activation is not merely representing mirrored muscle movements but the goals of actions perceived, quite specifically. And, in this way, there is some reason to think that the categorical perception of outcomes to which speech acts are directed is underpinned by mechanisms that encode the goals of observed actions, more generally.

We should, of course, be cautious when moving from findings about the neural underpinnings of a cognitive process to theories about the cognitive architecture of the process itself. Thus, we should not conclude from the fact (if it is a fact) that the categorisation of perceived speech involves neural mechanisms also involved in the encoding of goals in observed action that speech perception and goal ascription are alike in the relevant cognitive respects. That said these findings are suggestive when considered within the context of this wider paper. It is not that certain goal ascriptions must be speech-perception-like in their cognitive underpinnings because these involve common neural mechanisms. Rather, it is that if goal ascriptions and speech perception are underpinned by common neural mechanisms it would be unsurprising to discover that they were alike with regards to their cognitive underpinnings (e.g. their modularity). So, if used tentatively, these findings give further credence to the suggestion that some goal ascriptions are underpinned by input systems akin, or common, to those involved in speech perception.

This suggestion is not a foregone conclusion. The thought that goal ascription requires central cognition is rarely made explicit. But one reason for thinking this might be that human behaviour is hugely variable and context sensitive.[7] As such, it might seem that identifying the goals of others’ behaviour requires similarly flexible and context sensitive reasoning; that identifying the goals of others’ action requires the central cognitive resources involved in understanding something about the ways in which these are brought about with access to information about any of the indefinitely many factors that might affect this behaviour. But, to apply such a line of reasoning across the board is, it seems to me, to underestimate the resources that a speech-perception-like module could have available to it when identifying the goals of certain observed actions. As we saw in §2.2, perceptible subtleties in the kinematics of action, such as wrist velocity and grip aperture, can be used to reliably anticipate the goals of surprisingly complex actions even prior to their realisation. For instance, they can be used to predict whether an arm movement is directed towards picking up an apple to eat, give away or throw (see Becchio et al., 2012, for a review of these and related findings). Likewise, Baldwin and Baird (2001) note the presence of perceptible regularities marking the boundaries between individual intentional actions. This suggests that simple principles, implicit in the operations of an informationally encapsulated module, could be used for the bottom-up identification (and even anticipation) of many goals, without relying on central cognitive resources.

Admittedly, these simple principles are, alone, unable to explain certain goal ascriptions that, plausibly, conform to the pattern of results discussed in §2. For instance, various studies indicate that young infants will inflexibly anticipate the goals of simple geometric shapes’ movements by taking into account information about these shapes’ previous behaviour (e.g. Henrik and Southgate, 2012). This shows that such goal ascriptions cannot rely solely on kinematic cues such as grip aperture and wrist velocity and that they must draw on endogenously stored information about the agent’s previous behaviour and goals of its previous actions. But, it is not problematic (in any obvious way) to think that the goals of these ‘actions’ might also be identified by our hypothesised input module. Why? Because input modules need not rely solely on information processed from the bottom-up. Rather, they may also succumb to top-down effects.

To see this, consider the phonemic restoration effect where a single phoneme of a heard sentence is replaced with a cough or white noise. In such cases, most subjects report hearing the entire sentence, intact—they do not notice the missing phoneme (Warren, 1970). For proponents of modularity, like Fodor, this effect provides reason to think that there are top-down effects involved in the identification and categorisation of phonemes (1983, p.77); that the phone identification system has access to information about the way phonemes are combined and that it utilises this information to constrain its predictions when identifying the missing sound. However, these theorists are careful to distinguish this from the idea that the phone identification system is informationally unencapsulated. This is because the phonemic restoration effect is judgement independent. Subjects tested report hearing the entire sentence (with a cough/white nose ‘in the background’) even when they explicitly know that there is a missing phoneme in it (and even when they reflect on this fact). Consequently, the Fodorian proposes that while speech perception systems have access to information that is specified at the levels of representation they compute—e.g. typical combinations of phonemes—they lack generalized access to what the subject knows or believes—e.g. his or her beliefs about the interlocutor’s beliefs, desires and intentions (ibid.). Thus, while these systems are informationally encapsulated in a way that central systems are not, they are prone to certain top-down effects.

Returning to the goals that infants ascribe to the ‘actions’ of simple geometric shapes it then becomes possible to see how these might be the product of analogous processes. Just as the identification of phonemes is affected by the bottom-up identification of information relevant to phonemes perceived and top-down associations formed between phonemes that have been identified in the past, we can hypothesise that analogous modules involved in the identification of observed goals might draw on both sensory information, processed bottom-up, according to simple principles (e.g. the statistical regularities discussed above) and top-down associations formed between goals and behaviour identified prior. Provided that there is sufficient statistical information to identify the shape’s goals in habituation trials—which is plausible in Henrik and Southgate’s (2012) given that the shape stopped and paused at the target object—subsequent ascriptions could well be the result of modules that are sensitive to associations formed between behaviour and goals perceived prior. Therefore, it remains possible that such goal ascriptions are made by input systems, akin to those involved in speech perception.

Is this suggestion also plausible? One reason for scepticism may concern the neural underpinnings appealed to before; mirror neurons in the PFC that encode the goals of observed and produced actions in a common vocabulary. One might doubt that the goals of a simple geometric shape’s actions could be encoded in such a vocabulary given that we share no obvious motor processes with geometric shapes. After all, I’m a human with arms and legs; something lacking in your average circle. So, how could I mirror the movements of a circle? Is this even possible? One reason to think so is this: mirror neurons appear to involve the translation of goals perceived into one’s own motor vocabulary. For instance, aplasic individuals, born without arms or hands, show the same neural activation patterns in their PFC when they perform grasping actions with their feet as when they observe isomorphic actions, performed by normal humans, with their hands (Gazzola et al., 2007). This suggests that the mirroring process involves the translation of observed actions into a code, common to one’s own motor actions. So, provided that the goal of the observed physical action can be produced by the human observer, there is no obvious reason why it could not be encoded within their PFC by mirroring processes.

Perhaps a deeper concern stems from studies that have been taken to suggest the cognitive penetrability of the PFC, thereby implying significant levels of unencapsulation in the cognitive processes it realises. For instance, Iacoboni et al. (2005) compared the PFC activity of subjects observing grasping actions performed in and out of context; e.g. a subject reaching for a full cup of tea to drink that was situated beside biscuits and a teapot against the PFC activity elicited when a subject reached for an identical, but empty, teacup located on its own. They found that PFC activity was greater during the first condition, thereby suggesting that the process was affected by the subjects’ beliefs about the situation.

There are a number of reasons to reserve judgement on this conclusion, however. Firstly, the fullness of the cup was observable (p.539). As such, it is possible that this was an observable cue, processed by a module bottom-up in performing a goal ascription. Indeed, this is actually what we should expect if, as I have suggested we take seriously, the system draws on kinematic cues, like grip aperture and wrist velocity. Why? Because these kinematic variables must be taken into account relative to the size and shape of the object being grasped (Becchio et al., 2012). Since the milky tea in the cup used in this study was opaque, it effectively changed the shape of the perceived object being reached for. Given that this sort of information would have to be available to the module, as I have envisaged it, and given that full cups are more likely to be drunk from than empty cups, it is possible that this provided further information with which to make the relevant goal ascription. This is particularly plausible given that, as we have seen, input systems do not only utilise incoming sensory information, but are also sensitive to the effects of endogenously stored associations between the pieces of information it computes.

Secondly, contextual effects of the kind observed by Iacoboni and colleagues affect human categorical perception quite generally, despite the fact that categorical perception is just about as plausible a candidate for informational encapsulation as any process. For instance, the categorical perception of an angry facial expression will be encoded as a disgusted facial expression in certain contexts (e.g. when attached to a body holding a rotten fish at arm’s length) independently of the observers’ beliefs about the stimuli (Aviezer et al., 2008). Similarly, it is plausible to suppose that the categorisation of phonemes is sensitive to factors, such as whether or not the observed agent has a pen in her mouth. Given that Iacoboni et al. did not test the effects of subjects’ explicit beliefs on PFC activation, directly, and did not examine the judgement independence of this activation, it is not obvious that their study didn’t simply reveal a contextual effect, typical of input systems, quite generally. Certainly, it does not reveal the PFC to be more prone to cognitive effects than better-understood input processes, like those involved in the categorisation of phonemes, and this is what a study would have to show to undermine the idea that input systems identify goals in the way they identify phonemes.[8]

4. Conclusion

Tentatively, we can then draw two conclusions from our discussion. Firstly, we should take seriously the idea that some goal ascriptions are underpinned by input systems, akin (or perhaps even common) to those involved in the categorisation of phonemes in perceived speech. One preliminary reason for this is that some goal ascriptions are underpinned by processes that, plausibly, display properties that are distinctive of input systems quite generally (such as those involved in the categorisation of phonemes). A second reason for taking this seriously is that phonemic categorisation is importantly like the goal ascriptions that have been our concern in that it involves the parsing and categorisation of outcomes to which observable speech acts are directed; processes that are, plausibly, underpinned by common mechanisms. Finally, we have considered a number of obvious objections to these suggestions and shown that these can be resisted. Thus, there is considerable reason to think that some goal ascriptions might be performed by modular input systems and no obvious reason to reject this suggestion.

At various points throughout this paper, I have considered ways in which these suggestions could be further adjudicated. If they are to be taken seriously, however, we can draw a second (tentative) conclusion: that humans possess distinct kinds of system that perform goal ascriptions. Since humans can and do reason rationally and abductively about the goals of others actions some of the time it cannot be the case that all goal ascriptions are the result of input systems. So, at best, the style of account I have sought to motivate throughout this paper will only apply to some goal ascriptions and this raises a number of interesting questions. For instance, we can ask what the limits of these distinct kinds of system are, the contexts in which they are/are not recruited and their relationship to one another. These are important questions, but they are questions for another day.




  1. Alajouanine, T., Lhermitte, F., Ledoux, M., Renaud, D., Vignolo, L.A. (1964) Les composantes phonémiques et sémantiques de la jargonaphasie. Revue Neurologique 110, 5–20.
  2. Ambrosini, E., Reddy, V., de Looper, A., Constantini, M., Lopez, B., and Sinigaglia, C. (2013). Looking Ahead: Anticipatory Gaze and Motor Ability in Infancy. PLOS one [first published online July, 04, 2013].
  3. Ansuini, C., Cavallo, A., Koul, A., Jacono, M., Yang, Y., and Becchio, C. (2015). Predicting Object Size from Hand Kinematics: A Temporal Perspective. PLOS One. 10(3).
  4. Aslin, R.N. (2008). What’s in a look? Developmental Science. 10(1): 48-53.
  5. Aviezer, H., Hassin, R., Ryan, J., Grady, C., Susskind, J., Anderson, A., Moscovitch, M., & Bentin, S. (2008a). Angry, disgusted or afraid? Studieson the malleability of emotion perception. Psychological Science, 19, 724-732.
  6. Baron-Cohen, S. (1994). How to build a baby that can read minds: cognitive mechanisms in mindreading. Curr. Psychol. Cogn. 13: 1-40.
  7. Baldwin, D. and Baird, J. (2001). Discerning intentions in dynamic human action. Trends in Cognitive Sciences, 5(4), pp.171-178.
  8. Becchio, C., Sartori, L., Bulgheroni, M., and Castiello, U. (2008). Both your intention and mine are reflected in the kinematics of my reach-to-grasp movement. Cognition. 106: 894-912.
  9. Becchio, C., Sartori, L., Bulgheroni, M., & Castiello, U. (2008a). The case of Dr. Jekyll and Mr. Hyde: A kinematic study on social intention. Consciousness and Cognition, 17, 557–564.
  10. Bertenthal, B., Proffitt, D. & Cutting, J. (1984) Infants sensitivity to figural coherence in biomechanical motions      , 213-230.
  11. Bertenthal, B. I., Proffitt, D. R., Kramer, S. J. & Spetner, N. B. (1987). Infants’ encoding of kinetic displays varying in relative coherence.   , 171- 178.
  12. Birch, S.A.J and Bloom, P. (2007). The curse of knowledge in reasoning about false beliefs. Psychological Science, 18(5): 382-386.
  13. Braddon-Mitchell, D. and Jackson, F. (1996). Philosophy of Mind and Cognition. Oxford: Blackwell.
  14. Browman, C. and Goldstein, L. (1992). Articulatory phonology: an overview. Phonetica, 49(3-4), pp. 155-80
  15. Butterfill, S. (2007) ‘What are Modules and What is Their Role in Development?’, in Mind and Language, 22 (4), pp.450-473.
  16. Caggiano V., Fogassi L., Rizzolatti G., Casile A., Giese M. A., Their P. (2012). Mirror neurons encode the subjective value of an observed action. Proc. Natl. Acad. Sci. U.S.A. 109, 11848–11853.
  17. Candidi, M., Urgesi, C., Ionta, S., & Aglioti, S.M. (2008). Virtual lesion of ventral pre-motor cortex impairs visual perception of biomechanically possible but not impossible actions. Social Neuroscience, 3(3–4), 388–400.
  18. Carruthers, P. (2006) The Architecture of the Mind, Oxford: Oxford University Press.
  19. Carruthers, P. (2015). Mindreading in adults: evaluating two-systems views. Synthese, p.1-16 (Online 23rd June 2015).
  20. Clark, A. (1997). Being There: Putting Brain, Body and World Together Again. Cambridge: MIT Press.
  21. Clark, A. and Lupyan, G. (2015). Words and the World: Predictive Coding and the Language-Perception-Cognition Interface. Current Directions in Psychological Science, 24(4) 279–284.
  22. Costantini, M., Ambrosini, E., Cardellicchio, P., & Sinigaglia, C. (2013). How your hand drives my eyes. Social Cognitive and Affective Neuroscience, Advance Access.
  23. Coulter, K.S. and Coulter, R.A. (2005), “Size Does Matter: The Effects of Magnitude Representation Congruency on Price Perceptions and Purchase Likelihood,” Journal of Consumer Psychology, 15 (1), 64-76.
  24. Csibra, G. and Gergely, G. (1998). The teleological origins of mentalistic action explanations: a developmental hypothesis. Developmental Science. 1, pp.255–259.
  25. Csibra, G., Gergely, G., Biro, S., Koos, O., and Brockbank, M. (1999) ‘Goal Attribution Without Agency Cues: The Perception of Pure Reason in Infancy’, in Cognition, 72, pp.237-267.
  26. Dennett, D. (1987). The Intentional Stance. Cambridge: MIT Press.
  27. Deroy, O. (2014). ‘Modularity’, in M. Matthen (ed) Oxford Handbook of Philosophy of Perception, Oxford: Oxford University Press.
  28. Fadiga L, Craighero L. Hand actions and speech representation in Broca’s area. Cortex. 2006;42(4):486–490.
  29. Fantasia, V., Markova, G., Fasulo, A., Costall, A., & Reddy, V. (2016). Not just being lifted: infants are sensitive to delay during a pick-up routine. Frontiers in Psychology , 6, [2065].
  30. Ferrari, P., Rozzi, S., and Fogassi, L. (2005). Mirror neurons responding to observation of actions made with tools in the monkey ventral premotor cortex. Journal of Cognitive Neuroscience, 17(2); pp. 212-226.
  31. Feiman, R., Carey, S., & Cushman, F. A. (2015). Infants’ representations of others’ goals: Representing approach over avoidance. Cognition, 136, 204-214.
  32. Fodor, J. (1983). The Modularity of Mind. Cambridge: MIT Press.
  33. Fodor, J. (1989). Psychosemantics: the Problem of Meaning in the Philosophy of Mind. Cambridge: MIT Press.
  34. Fodor, J. (2000). The Mind Doesn’t Work That Way. Cambridge: MIT Press.
  35. Fowler, C.A., Rosenblum, L.D., 1991. The perception of phonetic gestures. In: Mattingly, I.G., Studdert-Kennedy, M. (Eds.), Modularity and the Motor Theory of Speech Perception. Lawrence Erlbaum, Hillsdale, NJ, pp. 33–59.
  36. Galantucci, B., Fowler, C. and Turvey, M. (2006). The motor theory of speech perception reviewed. Psychonomic Bulletin & Review, 13: pp. 361-377.
  37. Gallese, V., Fadiga, L., Fogassi, L. & Rizzolatti, G. (1996). Action recognition in the premotor cortex.
Brain, 119, 593–609.
  38. Gazzola, V., Rizzolatti, G. Wicker, B. & Keysers, C. (2007). The anthropomorphic brain: The mirror neuron system responds to human and robotic actions. NeuroImage, 35, 1674–1684.
  39. Csibra, G. (2003) Teleological and referential understanding of action in infancy. Philos. Trans. R Soc. B Biol. Sci. 29, 447–458
  40. Gergely, G., Nadasdy, Z., Csibra, G., and Biro, S. (1995). Taking the Intentional Stance at 12months of Age. Cognition, 56, pp.165-193.
  41. Gergely, G. and Csibra, G. (2003). Teleological reasoning in infancy: the naïve theory of rational action. Trends in Cognitive Science, 7(7), pp.287-292.
  42. Gopnik, A. and Wellman, H. (1992). Why the child’s theory of mind really is a theory. Mind and Language. 7(1): pp.145-171.
  43. Gopnik, A. and Meltzoff, A. (1997). Words, Thoughts and Theories. Cambridge: MIT Press.
  44. Henrik, M. and Southgate, V. (2012). Nine-months-old infants do not need to know what the agent prefers in order to reason about its goals: on the role of preference and persistence in infants’ goal-attribution. Developmental Science, 15(5), pp.714-722.
  45. Hickok, G. (2009). Eight Problems for the Mirror Neuron Theory of Action Understanding in Monkeys and Humans. Journal of Cognitive Neuroscience. 21(7): 1229-1243.
  46. Howhy, J. (2013) The Predictive Mind. Oxford, OUP.
  47. Iacoboni, M. (2008). The Role of the Premotor Cortex in Speech Perception: Evidence from fMRI and rTMS. Journal of Physiology, 102: pp.31-34.
  48. Iacoboni, M., Molnar-Szakacs, I., Gallese, V., Buccino, G., and Mazziotta, J.C. (2005). Grasping the intentions of others with one’s own mirror neuron system. PLOS Biology. 3(3): 529-535.
  49. Jacob, P. and Jeannerod, M. (2005). The Motor Theory of Social Cognition: a critique. Trends in Cognitive Science, 9: pp.21-25.
  50. Kahneman, D. (2011). Thinking, fast and slow. London: Penguin.
  51. Kamewari, K., Kato, M., Kanda, T., Ishiguro, H., & Hiraki, K. (2005). Six-and-a-half-month-old children positively attribute goals to human action and to humanoid-robot motion. Cognitive Development, 20, pp.303–320.
  52. Lewis, D. (1994). Reduction of Mind. Samuel Guttenplan (ed.), A Companion to Philosophy of Mind, Oxford: Blackwell Publishers, pp. 412–431.
  53. Liberman, A. and Mattingly, I. (1985). The Motor Theory of Speech Perception Revised. Cognition, 21(1): pp. 1-36.
  54. Liberman, A. and Whalen, D. (2000). On the Relation of Speech to Language. Trends in Cognitive Sciences. 4(5)pp. 187-96.
  55. Luria, A.R., 1966. Higher Cortical Functions in Man. Basic Books, New York.
  56. Mandelbaum, E. (2015). The Automatic and the Ballistic: Modularity beyond perceptual processes. 28(8): 1147-1157.
  57. Manera, V., Shouten, B., Becchio, C., Bara, B. & Verfaillie, K. (2010)
. Inferring intentions from biological motion: A stimulus set of point-light communicative interactions, Behaviour Research Methods, 42,168-178.
  58. Manera, V., Becchio, C., Schouten, B., Bara, B., and Verfaillie, K. (2011a). Communicative interactions improve visual detection of biological motion. PLoS One 6:e14594.
  59. Manera, V., Del Giudice, M., Bara, B., Verfaillie, K., and Becchio, C. (2011b). The second-agent effect: communicative gestures increase the likelihood of perceiving a second agent. PLoS One 6:e22650.
  60. Manera, V., Becchio, C., Cavallo, A., Sartori, L., and Castiello, U. (2011c). Cooperation or competition? Discriminating between social intentions by observing prehensile movements. Experimental Brain Research, 211, pp.547–556.
  61. Maheswaran, D., Mackie, D., & Chaiken, S. (1992), “Brand Name as a Heuristic Cue: The Effects of Task Importance and Expectancy Confirmation on Consumer Judgments,” Journal of Consumer Psychology, 1(4), 317-336.
  62. Michael, J., Sandberg, K., Skewes, J., Wolf, T., Blicher, J., Overgaard, M., & Frith, C.D. (2014). Continuous theta-burst stimulation demonstrates a causal role of premotor homunculus in action understanding. Psychological Science, 0956797613520608.
  63. Mitchell, P., Robinson., E.J., Isaacs., E.J., and Nye, R.M. (1996). Contamination in Reasoning about False Beliefs: an instance of realist bias in adults but not children. Cognition, 59: 1-21.
  64. Newtson, D (1973). Attribution and the unit of perception of ongoing behavior. Journal of Personality and Social Psychology. Vol. 28(1), 28(1), pp.28-38
  65. Newtson, D. and Gretchen E. (1976). The Perceptual Organization of Ongoing Behavior. Journal of Experimental Social Psychology, 12(5), pp.436-50
  66. Newtson, D., Engquist, G. and Bois, J. (1977). The objective basis of behavior units. Journal of Personality and Social Psychology. Vol. 35(12), 35(12), pp.847-862.
  67. Nichols, S. and Stich, S. (2001) Mindreading. Oxford: OUP.
  68. Peeters, R., Simone, L., Nelissen, K., Fabbri-Destro, M., Vanduffel, W., Rizzolatti, G. & Orban, G.A. (2009). The representation of tool use in humans and monkeys: common and unique human features. Journal of Neuroscience, 29, 11523–11539.
  69. Phillips, A., & Wellman, H. (2005). Infants’ understanding of object-directed action. Cognition, 98, pp.137–155.
  70. Pinker, S. (2005) ‘So How Does the Mind Work?’, in Mind and Language, 20(1), pp.1-24.
  71. Pobric, G. & Hamilton, A. (2006). Action understanding requires the left inferior frontal cortex. Current Biology, 16(5), 524–529.
  72. Potter, M. (1975). Meaning in Visual Search. Science, 187, pp.965-966.
  73. Premack, D. (1990). The Infant’s theory of self-propelled objects. Cognition. 36: 1-16.
  74. Prinz, J. J. (2006) ‘Is the mind really modular?’, in R. Stainton (ed.), Contemporary Debates in Cognitive Science, pp. 22–36, Oxford: Blackwell.
  75. Pylyshyn, Z. (2000). Is Vision Continuous with Cognition: the case for cognitive impenetrability of visual perception. Behavioral and Brain Sciences 22 (3):341-365.
  76. Reddy, V., Markova, G., and Wallot, S. (2013). Anticipatory Adjustments to Being Picked up in Infancy. Plos One, 8(6).
  77. Reid, V., Belsky, J., & Johnson, M. (2005). Infant perception of human action: Toward a developmental cognitive neuroscience of individual differences. Cognition, Brain, Behavior, 9(2), pp.35–52.
  78. Rizzolatti, G. & Sinigaglia, C. (2010). The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations. Nature, 11, 264-274.
  79. Saffran, J., Hauser, M., Seibel, R., Kapfhamer, J., Tsao, F. and Cushman, F. (2008). Grammatical pattern learning by human infants and cotton-top tamarin monkeys. Cognition, 107(2) pp.479-500
  80. Samuels, R. (2002), Nativism in Cognitive Science. Mind & Language, 17: 233–265.
  81. Samuel, A. (1981). Phonemic Restoration: insights from a new methodology. Journal of Experimental Psychology: General, 110, pp.474-494.
  82. Saylor, M., Baldwin, D., Baird, J., and LaBounty, J. (2007). Infants’ On-line Segmentation of Dynamic Human Action. Journal of Cognition and Development, 8(1) pp.113-113
  83. Scholl, B. J., & Gao, T. (2013). Perceiving animacy and intentionality: Visual processing or higher-level judgment? In M. D. Rutherford & V. A. Kuhlmeier (Eds.), Social perception: Detection and interpretation of animacy, agency, and intention (pp. 197-230). Cambridge, MA: MIT Press.
  84. Shim, J., Carlton, L., Chow, J., & Chae, W. (2005). The use of anticipatory visual cues by highly skilled tennis players. Journal of Motor Behavior, 37, pp.164-175.
  85. Soderstrom, M., Kemler, D., Jusczyk, N. and Jusczyk, P. (2005). Six-month-olds recognize clauses embedded in different passages of fluent speech. Infant Behavior and Development, 28(1) pp.87-94
  86. Southgate, V., Johnson, M., & Csibra, G. (2008). Infants attribute goals even to biomechanically impossible actions. Cognition, 107, pp.1059-1069
  87. Southgate, V., & Csibra, G. (2009). Inferring the outcome of an ongoing novel action at 13 months. Developmental Psychology, 45, pp.1794–1798.
  88. Spaulding, S. (forthcoming). On whether we can see intentions? Pacific Philosophical Quarterly. (published online: 19 November, 2015)
  89. Spelke, E., 1994. Initial knowledge: Six suggestions. Cognition, 50: 435–445.
  90. Steinpress, R., Anders, K. and Ritzke, D. (1999). The Impact of Gender on the Review of the Curricula Vitae of Job Applicants and Tenure Candidates: A National Empirical Study. Sex Roles, 41(7/8): pp.509-528.
  91. Stromswold, K., 1999. Cognitive and neural aspects of language acquisition. In E. Lepore and Z. Pylyshyn, eds., What Is Cognitive Science?, Oxford: Blackwell, pp. 356–400.
  92. Thorndike, A.N. Sonnenberg, L., Riis, J., Barraclough, S. & Levy, D.E. (2012). A 2-Phase Labeling and Choice Architecture Intervention to Improve Healthy Food and Beverage Choices. American Journal of Public Health. 102(3): pp.527–533.
  93. Tversky, A. and Kahnemann, D. (1974). Judgement under uncertainty: heuristics and biases. Science. 185 (4157) pp.1124-1131.
  94. Urgesi, C., Candidi, M., Ionta, S., & Aglioti, S.M. (2007). Representation of body identity and body actions in extrastriate body area and ventral premotor cortex. Nature Neuroscience, 10(1), 30–31.
  95. Watkins KE, Strafella AP, Paus T. Seeing and hearing speech excites the motor system involved in speech production. Neuropsychologia. 2003;41:989–994.
  96. Warren, R. (1970). Perceptual Restoration of Missing Speech Sounds. Science, 167, pp.392-393.
  97. Wilson SM, Iacoboni M. Neural responses to non-native phonemes varying in producibility: evidence for the sensorimotor nature of speech perception. NeuroImage. 2006;33(1):316–325.
  98. Wilson SM, Saygin AP, Sereno MI, Iacoboni M. Listening to speech activates motor areas involved in speech production. Nature Neuroscience. 2004;7:701–702.
  99. Woodward, A. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition. 69 (1) pp.1-34.
  100. Woodward, A., Sommerville, J., Gerson, S., Henderson, A., & Buresh, J. (2009).‘The emergence of intention attribution in infancy’, in B. Ross (Ed.) The Psychology of Learning and Motivation, Vol. 51 (pp.187-222). Waltham,MA: Academic Press.
  101. Zacks, J., Tversky, B. and Iyer, G. (2001). Perceiving, remembering, and communicating structure in events. Journal of Experimental Psychology: General. Vol. 130(1), pp.29-58.



[1] Use of the word ‘goal’ should therefore be distinguished from cases where the label refers to an agent’s pro-attitudes (e.g. Underwood’s desire to be president). A goal, as I am using the term, refers to an outcome, there in the world, rather than a subject’s mental states or drives. Clearly, humans can and do ascribe mental states and drives to others, but this will not be my concern in the present treatment.

[2] Note that, even if Csibra and Gergely were to retract such a claim under pressure, the point remains: these theorists have not endorsed any alternative proposals in their extensive work on goal ascription. Consequently, if it is true to say that some goal ascriptions do not involve abductive reasoning an alternative story will still need to be found to account for these.

[3] This leaves open the possibility that input systems are not as encapsulated as Fodor suggests. Even if Fodor is wrong to suggest that input systems are entirely unaffected by information located outside the system (e.g. the subject’s beliefs) it is, I take it, undeniable that input systems are relatively unaffected by information located outside the system when compared with the processes involved in rational abduction and thought. For instance, proponents of predictive coding sometimes hold that any of a subject’s beliefs and expectations can affect perceptual processing (Clark, 2013; Howhy, 2013). However, in order to accommodate the apparent judgement independence of visual illusions, etc., these theorists posit that effects are Bayes optimised over a large amount of time (Clark and Lupyan, 2015). This is quite unlike rational thought, where effects can be more or less immediate. As a result, it remains true to say that one’s salient and occurrent beliefs do not have the immediate effects on perceptual processing that they appear to have on belief fixation and rational judgement and that this difference—which I will continue to call ‘informational encapsulation’—is telling.

[4] Incidentally, automaticity is another property that Fodor takes to be distinctive of modular processes. This is controversial, however; hence why I have avoided placing weight on it here. For instance, Carruthers (2006) suggests that all cognitive processes are automatic in some sense (but see Mandelbaum, 2015).

[5] Other properties of input modules that Carruthers takes to be uncharacteristic of central systems include significant innateness and shallow outputs (2006, p.12). I take it to be an open question whether systems performing simple goal ascriptions possess these properties in addition to (1), (2) and (3). There certainly seems to be no obvious reason why goal ascription would need to involve particularly deep outputs, but offering a more definitive answer is difficult, in part, due to controversies concerning quite what shallowness is (compare Fodor, 1983, p.87, with Butterfill, 2007, p.462-468). Similar concerns may arise with regards to the innate development of goal reasoning in infants (Samuels, 2002), however there is at least suggestive evidence that goal reasoning develops according to a pattern (Woodward, 2009). In order to assess this further, cross-cultural studies would need to be run to determine if this pattern of development is universal in the way that the development of speech perception (Stromswold, 1999) or visual processing (Spelke, 1994) is thought to be.

[6] Peter Carruthers has endorsed such an argument in response to dual system accounts of belief reasoning; see his 2015.

[7] Jacob and Jeannerod (2005) and Spaulding (forthcoming) appear to make suggestions of this sort.

[8] This point applies to other studies that may be taken to suggest top-down influences on PFC activity: e.g. Cagiano et al. (2012).

9 thoughts on “Goal Ascription for the A-rational”

  1. Hi Sam, thanks for the paper! I have three thoughts, one friendly to your case, one less friendly, and one perhaps orthogonal.

    The friendly comment is that it seems like you might be able to strengthen your case for encapsulation by identifying cases where we seem to be susceptible to robust illusions concerning goal-ascription. One such case might be when watching actors, or when we know someone is trying to deceive us: we might still ‘see them as’ acting for some goal which we know they don’t really have.

    The less friendly comment concerns accessibility: while the bases of goal-ascription do often seem to be less accessible than those of canonical rational inferences, they also seem to be more accessible than canonically modular processes. In some forms of early perceptual processing, in particular, it seems like I have no access at all to the original inputs – for instance, when my brain calculates the distance from me of an object, based on binocular disparities, I don’t see the two images at all. But generally when I, say, see a goal-directed arm movement, I am aware of the physical movements of the arm as something that could, potentially, be seen as not-goal-directed. Moreover, even when I can’t initially report the features of the movement that make it seem goal-directed, there is often the following sort of imperfect access: reflection on the question of why I see it as goal-directed, and comparison with other actions, will often guide my attention to particular aspects of the movement (‘it was the way they were leaning into the motion’, or ‘it was what they were doing with their fingers’) that seem to be the important ones. I’m not sure how much of a threat this is to your case – perhaps I’m just asking whether you think there are meaningful categories of accessibility in between ‘accessible’ and ‘inaccessible’.

    The final comment is just a question about what content goal-ascriptions have (this is related to several things that Kristin Andrews and Joulia Smortchkova say above). The standard line, which you follow, seems to be that this is all a matter of us perceiving actions as being actually directed at a certain goal. But I wonder if we have direct evidence against an alternative construal of some or all goal-ascriptions, namely that they involve us perceiving actions as being suitable for achieving a certain goal – as ‘a good way to’ rather than as ‘an attempt to’. That would be a much less mentalistic way of reading their content, and might make a difference to whether we count particular goal-ascriptions as true/false/veridical/illusory. Would that affect the case you’re making here?

    1. Hi Luke, thanks for the comments!

      Starting with the friendly comment, I’m obviously tempted by the thought. Gabriel Segal makes a similar suggestion in a paper about the modularity of mindreading. While I don’t have the paper to hand (it is, I believe, his paper in Carruthers’ and Stone’s collection ‘Theories of Theory of Mind’) he suggests that certain mental state ascriptions appear to be encapsulated because when we watch an actor acting sad they can continue to look sad even when we know this isn’t the case. There is a worry with the suggestion, however. Namely, that the actor’s ‘looking’ or ‘appearing’ sad is actually judgement dependent in a strong sense but that what we’re judging is that the actor is depicting a sad person. I’m not sure how to avoid this worry (very open to suggestions!) and a similar concern would presumably apply to the cases you have in mind. This isn’t to say that they should be disregarded- I take it that no single case is going to demonstrate the judgement independence (or encapsulation, more generally) of certain goal ascriptions. So perhaps the best strategy for me to pursue would be to look for a bunch of cases (including ones like those you mention) and to just try and build a general picture on which the encapsulation of these processes looks increasingly likely. Thanks!

      Concerning the less friendly comment, the worry is well taken given how I frame my discussion. However, in light of your comment, I’d want to deny that all inputs to modular processes are strictly inaccessible in the way that the inputs to low-level visual processes seem to be. To give an example, consider phenomena like the McGurk effect suggesting that phone categorisation is performed by a (largely) encapsulated, modular process which draws upon both auditory and visual cues—I take it that these cues would only enjoy the kind of weak inaccessibility characteristic of the kinematic cues discussed in the case of goal ascription? Does that seem right?

      On the third comment, I’m curious. I take it that what you’re suggesting would require the system to both project/track a goal/outcome and identify the preceding behaviour leading to the realisation of this as, in some sense, suitable to this end. Would this not just be a stronger suggestion than the one I’m making?

      1. On the ‘illusions’, I suppose it doesn’t seem to me that my seeing someone as acting for some goal I know they don’t have is dependent on thinking that they’re actively simulating it – that’s just the most likely explanation for them seeming strongly to have it. A case where there was a behavioural ‘false friend’ across two different cultures or species – i.e. a very similar movement with completely different goals, like similar words in different languages with completely different meanings – might provide an example of this sort. But it’s harder to say.

        I’m not sure what to say about the analogy to the McGurk effect – it does seem that with more attention we can try to isolate the two inputs, though having gone and watched a demonstration on youtube my main feeling is that I’m corrupted by knowing what to look for.

        On goals, I guess my thought was that suitability for a goal is a weaker conclusion to draw than directedness towards a goal, since an action can be suitable for many goals but (with caveats) directed only to one. But maybe I should step back and ask a more basic question: in your reply to comments you say that you don’t intend ‘goals’ to be mentalistic – that an action is directed at a goal is not a matter of any mental state of the agent’s. Do you think of representations of goals in this non-mentalistic sense as having truth-conditions? If so, when are they true?

        1. As I’m inclined to see it, the veridicality of the system’s goal ascription would be akin to the veridicality of a perceptual representation, like an object file, or the speech system’s correct/incorrect categorisation of a phoneme.

          There are going to be difficult questions about how we would decide for or against the weaker hypothesis you’re considering, concerning suitability instead of directness. Perhaps I should look back over Gergely and Csibra’s stuff on natural pedagogy- on the basis of all that you might think that what infants really need to be able to do is identify when a goal is being suitably pursued. But I think my inclination would still be that this is best served by a goal ascribing system and that perceived inappropriateness stems from a mismatch between goals anticipated and goals actualised, or something like that. In any case, it’s something I should think more about. Thanks!

  2. In these comments on Clarke’s very interesting paper, I’ll begin by attempting to situate Clarke’s contribution in the larger interdisciplinary literature on social cognition. Doing so will help me to identify some questions and concerns I have about the commitments of his account of goal ascription, the evidence in its favor, and how to distinguish it from accounts that represent goals as mental states.

    1. Situating Clarke’s Contribution

    Clarke frames his account of goal ascription as challenging the “popular stance” that goal ascription is the outcome of rational, abductive reasoning. Clarke persuasively exposes this assumption in Gergely & Csibra’s popular “teleological” theory of goal ascription. To help recognize the unique contribution of Clarke’s account and evaluate its merits, I will situate it within a few related trends in the interdisciplinary literature on human and nonhuman social cognition.

    Over the last three decades or so, much of the literature on human social cognition has focused on mindreading, particularly the development toward its mature, full-blown form involving the ascription of propositional attitudes. Recently, a confluence of factors has led to a spate of research offering a more fine-grained investigation of the cognitive basis of social cognition in all of its forms. I’ll emphasize two related trends within this research. One is articulating forms of social cognition that lie between (a) low-level representations of mere physical stimuli, for example, “behavior reading” that represents what physical behaviors tend to occur in what environmental conditions, and (b) full-blown mindreading involving the ascription of propositional attitudes. Another trend involves using the methods of cognitive psychology and neuroscience to clarify the cognitive architecture and processing characteristics of these different forms of social cognition, such as their speed, automaticity, and flexibility.

    A prominent and illustrative example of these two trends is the work of Ian Apperly and Stephen Butterfill (Apperly, 2011; Apperly & Butterfill, 2009; Butterfill & Apperly, 2013; 2016) positing that humans possess two distinct mindreading systems. These separate mindreading systems are offered to explain two distinct types of mindreading processes: ones that are slow, flexible, and resource intensive, versus ones that are fast, automatic, and cognitively efficient. Their background assumption is that our cognitive architecture exhibits a tradeoff between efficiency and flexibility, such that a single system could not handle both types of mindreading processes. The flexible mindreading episodes are said to be explained by a system capable of engaging in full-blown mindreading. They hypothesize that we possess a separate “minimal mindreading” system to enable fast, automatic mindreading. It is “minimal” in the sense of using a simpler model of people’s minds that is not merely a form of behavior-reading, but falls short of the flexible system’s full-blown ascription of propositional attitudes. Apperly & Butterfill believe the minimal mindreading system develops early in infancy, while the flexible system is later-developing, with adults possessing both systems.

    With regard to both these trends, Clarke’s account of goal ascription has clear affinities to this two-systems mindreading account. On Clarke’s account, goals are conceived of teleologically as the outcomes toward which our actions are directed, rather than as mental states or merely physical, non-teleological, non-intentional states. Butterfill & Apperly (2016) have recently posited that their efficient, minimal mindreading system ascribes goals in just this sense. Further, Clarke’s account of goal ascription emphasizes similar processing characteristics as their minimal mindreading system: fast speed, inaccessible/unconscious operation, and (relative) informational encapsulation—though Apperly & Butterfill also add automatic operation to the list while Clarke does not (see p. 10 note 4).

    Given that two-systems theorists have largely focused on the efficient system’s tracking of agents’ epistemic states, Clarke provides a novel contribution by focusing on the empirical evidence supporting the existence of an efficient from of goal ascription. It remains an open question, however, whether Clarke wants to subscribe to Apperly & Butterfill’s other commitments. For example, their minimal mindreading system departs from Gergely and Csibra’s non-mentalistic teleological account by also operating with representations of belief-like states they call “registrations.” Clarke does not explicitly take up this issue of how goal ascription relates to other states ascribed to observed agents, but it seems an important topic for future development of the view (see the exchange between Michael & Christensen, 2016, and Butterfill & Apperly, 2016).

    Another distinctive feature of Clarke’s account are his claims about the cognitive basis of some goal ascription. Clarke challenges popular accounts that treat all goal ascription as the product of rational abduction, which is characterized as a resource intensive, flexible form of inference characteristic of “central” cognition. He instead defends the idea that some goal ascription is enabled by “input systems” akin to those enabling speech and sensory perception. In this respect Clarke’s account calls to mind two prominent views in the mindreading literature that draw on sensory perception to characterize some mental state attribution. One is Susan Carey’s (2009) proposal that our understanding of intentional agency is a form of “core cognition” enabled by innate, modular “perceptual input analyzers” for generating representations of agents’ intentional mental states, including their goals and perceptual states. This evolution of Elizabeth Spelke’s (2000) “core knowledge” account of human cognition and cognitive development is certainly a prominent one in the literature not mentioned by Clarke. Clarke’s analogies to perception also call to mind accounts of “direct social perception” (DSP), which claim that some genuine mental state ascription can be perceptual in nature, rather than the product of non-perceptual reasoning or inference (see, e.g., papers in Michael & De Bruin, 2015). Admittedly, Clarke’s account does not share Carey and DSP accounts’ focus on the ascription of mental states. But Carey and Clarke both posit a modular or roughly modular input system for understanding actions. Unlike DSP accounts, Clarke does not explicitly tackle the issue of whether his form of goal ascription is genuinely perceptual in nature, or simply similar to perceptual processes. But Clarke shares a common space in the literature with Carey’s and DSP accounts by advocating socio-cognitive abilities more robust than behavior-reading and whose cognitive basis is more like sensory perception than reflective reasoning.

    In light of the above context, I have few questions about and challenges for Clarke’s account.

    2. A Second Goal Ascription System Across the Lifespan?

    Clarke ends his paper with the “(tentative) conclusion: that humans possess distinct kinds of system that perform goal ascriptions” (p. 17). Another way of putting it is that Clarke advocates a two-systems account of goal ascription: in addition to a central system using rational abduction, Clarke defends the existence of an input system for goal ascription, which is fast in speed, inaccessible, and relatively informationally encapsulated (while Clarke always refers to “input systems” for goal ascription, I’ll for the sake of simplicity refer to this as a single input system). But Clarke could be clearer about what he is claiming about the goal ascription systems of children vs. adults. Throughout the paper we find empirical evidence about the goal ascription abilities of infants as well as adults. So I’m fairly certain that Clarke thinks efficient goal ascription is early developing and persists into adulthood, akin to Apperly & Butterfill’s account of minimal mindreading. But Clarke isn’t explicit about this, so I invite him to clarify his position. Assuming I’m correctly characterizing Clarke’s position, I want to raise some concerns I have about Clarke’s claims about the informational encapsulation of this system.

    3. Informational Encapsulation of Adult Goal Ascription

    Clarke’s main argument in section §2 for a second type of goal ascription emphasizes three processing characteristics that are purportedly characteristic of input systems but not central systems engaging in rational abduction: (i) fast speed, (ii) inaccessibility, and (iii) informational encapsulation. In §2.1, Clarke provides evidence of fast goal ascription in both children and adults. Clarke’s evidence for inaccessibility in §2.2 concerns adults, but it would make sense that these considerations also apply to children (why would such awareness be present in children but disappear later in life?). Section §2.3’s discussion of informational encapsulation, however, focuses on what two-systems theorists would call “signature limits” in the goal ascriptions of infants (pp. 8-10). Clarke’s main example is Southgate et al.’s (2008) finding that 6-8-month-olds assume other agents will take the most efficient route to grasping a desired object, but don’t take into account all salient information when making this assessment: infants at this age incorporate information about the constraints of external reality, but systematically ignore salient information they possess about biomechanical constraints on limb movements (e.g., that human forearms cannot bend). Clarke proposes that the way in which infants’ goal ascriptions are insensitive to some of their beliefs is evidence of an informationally encapsulated goal ascription system.

    Identifying such “signature limits” in the goal ascriptions of infants does not, however, establish that the goal ascription processes of adult humans are similarly limited. Clarke does not provide any existing evidence about the belief-independence of adult goal ascriptions. This is a notable gap in the empirical support for Clarke’s second goal ascription system—assuming Clarke indeed is committed to saying this goal ascription system operates across the human lifespan.

    Clarke (p. 10) does describe a way to test more directly the belief-independence of goal ascriptions based on the work of Scholl and Gao (2013), which would appear to target adults as subjects. But he isn’t particularly clear about what kind information to which adults might be unresponsive in their goal ascriptions. He only says that these proposed studies would be designed such that “explicit knowledge about the action’s goal contradicts that suggested by kinematic cues, like wrist velocity and grip aperture” (p. 10). I tend to doubt Clarke is proposing that adults display exactly the same signature limits in their goal ascriptions that the infants do in the experiments he cites; surely adults and even older children are more sensitive to biomechanical constraints in their goal ascriptions than are the 6-8-month-old infants in Southgate et al. (2008). But if Clarke’s second system for goal ascription is really analogous to Apperly & Butterfill’s minimal mindreading system, this seems to be a commitment Clarke would need to make.

    On Apperly & Butterfill’s two-system account of mindreading, informational encapsulation is treated as a defining feature of the minimal mindreading system, enabling its efficient operation (crucially, its fast speed) and helping to distinguish the minimal mindreading system from the separate flexible mindreading system. A core part of their research project is to provide empirical evidence that children and adults display at least some of the same signature limits in their mindreading abilities, to specifically support the existence of an efficient but inflexible, informationally encapsulated mindreading system across the human lifespan. If Clarke is similarly proposing the existence of an efficient goal ascription system, it would seem crucial to empirically establish: (a) that adult humans do indeed display signature limits in some of their goal ascriptions—specifically, the kind of belief-independence essential to an informationally encapsulated system; and (b) that these signature limits are found in both children and adults. So far Clarke has provided no empirical support for these sorts of claims about adult goal ascription and its relation to children’s goal ascription. And it doesn’t seem very likely to me that the example of infant belief-independence Clarke has highlighted is likely to be found in adults. While future studies may confirm Clarke’s position, I do consider this a significant limitation in Clarke’s argument in §2 for the existence of a second goal ascription system.

    Clarke seems to treat his three processing characteristics as equally important in establishing the existence of a second type of goal ascription process driven by a roughly modular input system rather than a central process using rational abduction. But upon reflection, informational encapsulation seems especially important to Clarke’s argument. The processing efficiency provided by informational encapsulation seemingly is how such a system would achieve its fast speed compared to rational abduction. If informational encapsulation weren’t necessary for fast speed, then rational abduction could operate quickly and the motivation for a second goal ascription system based on speed would be undermined (Westra, in press, offers just such a challenge to the two-systems mindreading view). In addition, it is controversial whether, as Fodor thought, rational abduction and accessibility need go together. I doubt that advocates of the “popular stance” on goal ascription, such as Gergely and Csibra, think infants are typically aware of the processes of rational abduction driving their goal ascriptions. Researchers who claim that infants engage in complicated reasoning—e.g., “theory theorists” (Gopnik & Meltzoff, 1997), including its more recent formulation in terms of probabilistic, Bayesian reasoning (Gopnik & Wellman, 2012)—often treat these inferences as operating subpersonally, outside of conscious awareness. In light of these considerations about speed and accessibility, showing that a subset of goal ascriptions are belief-independent, and thus enabled by an informationally encapsulated system, seems especially important to defending the idea that these goal ascriptions are products of a second cognitive system separate from central cognition’s goal ascription via rational abduction. Without evidence of informational encapsulation in the goal ascriptions of adults, Clarke’s argument is weaker than it might at first appear.

    My comments here have assumed Clarke believes an efficient input system for goal ascription exists in both children and adults, akin to the two-system mindreading account. If that’s right, I have tried to identify a key point in Clarke’s argument needing further empirical support: the informational encapsulation of (some) adult goal ascription. If my assumption is wrong, I invite Clarke to clarify his commitments about the ontogeny of this second system for goal ascription, the kind of informational encapsulation displayed by adult goal ascription, and the argumentative importance of informational encapsulation relative to the other two processing characteristics he highlights.

    4. Internal vs. External Goal Ascription

    In this final section, I will return to my initial contextualization of Clarke’s account, which focused on accounts of mindreading, i.e., the ascription of mental states. Clarke frames his paper around goal ascription where a “goal” is conceived of non-mentalistically, as an outcome in the world to which an action is directed (p. 1). He is explicit that his focus is not on the ascription of “goals” conceived of as mental states of agents. Let’s call the former, non-mentalistic states “external goals” and the latter, mentalistic states “internal goals” (Hutto, Herschbach, & Southgate, 2011; Lurz & Krachun, 2001; Tomasello et al., 2005).

    Unfortunately, many researchers are often ambiguous about whether they are discussing external or internal goals when they say that infants can understand other agents’ “goals” or “goal-directed actions.” Gergely & Csibra’s teleological account of goal ascription is indeed hugely influential, so it is appropriate for Clarke to focus on it. But my reading of the literature is that many references to their work overlook the fact that it uses a non-mentalistic goal concept, and lump it in with evidence that infants can understand people’s mental states. For example, Carey (2009) posits a type of innate “core cognition” of intentional agency consisting of concepts of agents’ goals, attentional and perceptual states, and referential states (e.g., pointing gestures) (see p. 159). Carey’s lumping of goals in with attentional and perceptual states suggests she’s conceiving of goals as intentional mental states (Carey’s clear, however, that these are not full-blown mental state concepts of propositional attitudes). Further, some researchers even point to the early appreciation of others’ epistemic states (their perceptions, ignorance, and even false beliefs) as reason to reject Gergely & Csibra’s non-mentalistic teleological account (Baillargeon, Scott, & Bian, 2016). The basic idea seems to be: if mindreading develops much earlier than previously thought, perhaps a non-mentalistic form of (external) goal ascription is only needed very early in infancy, and is supplanted by a mentalistic form of (internal) goal understanding.

    In sum, some portion of the literature on goal ascription seems to be ambiguous between the concepts of external vs. internal goals, assumes goal understanding is a form of mindreading without engaging with the non-mentalistic alternative, or even outright rejects the importance of external goal ascription. What are the implications of this for Clarke’s paper?

    For one, I think the ambiguity between internal and external goal ascription complicates the way Clarke frames his paper as challenging a “popular stance” on goal ascription. The ambiguity means some researchers that might appear to fall within this “popular stance” might not share his target phenomenon of external goal ascription; they might instead be talking about internal goal ascription, a form of mindreading, as being driven by rational abduction. This just means more care must be taken to establish which researchers are proper targets of Clarke’s challenge in this paper. While Clarke has convincingly done this for Gergely and Csibra, the discussion of other authors is less detailed.

    A second implication for Clarke is that he could do more to motivate external goal ascription as a target phenomenon and distinguish it from internal goal ascription in both children and adults. I recognize that this is a large ask. Much ink has been spilled trying to theoretically and empirically distinguish mentalistic vs. non-mentalistic forms of social cognition. But it seems important for Clarke to do more to acknowledge the controversy and motivate a focus on external goal ascription, especially in adults who are capable of internal goal ascription.

    I’ll end with some questions on this front that I find personally interesting (see Herschbach, 2015). One distinguishing feature of Clarke’s account is his claim that some external goal ascription is driven by an input system, akin to sensory and speech perception. But DSP advocates make a similar claim about mental state ascription: that mental states such as goals and intentions are not always ascribed via a reasoning process, but sometimes via perception. DSP advocates offer different ways of distinguishing perception-based mindreading from reasoning-based mindreading, but an increasing number adopt the same sort of processing characteristics emphasized by Clarke, particularly fast speed and unawareness of the inputs and processing steps producing goal ascriptions as outputs (see papers in the recent special issue edited by Michael & De Bruin, 2015). If the DSP thesis is right about internal goals (perhaps Clarke will just balk right here), how could we distinguish this from Clarke’s account of external goal ascription? And why would we need external goal ascription if we are capable of this perceptual type of internal goal ascription? Does Clarke share Apperly & Butterfill’s assumption of a tradeoff between efficiency and flexibility as motivation for positing a non-mentalistic of goal understanding? Or is there some other reason to reject the idea, attributed above to Baillargeon et al. (2016), that humans from a very early age go beyond external goal ascription to engage in internal goal ascription and other forms of mindreading?


    Apperly, I. A. (2011). Mindreaders: The cognitive basis of “theory of mind.” Hove, England: Psychology Press.
    Apperly, I. A., & Butterfill, S. A. (2009). Do humans have two systems to track beliefs and belief-like states? Psychological Review, 116(4), 953–970.
    Baillargeon, R., Scott, R. M., & Bian, L. (2016). Psychological Reasoning in Infancy. Annual Review of Psychology, 67(1), 159–186.
    Butterfill, S. A., & Apperly, I. A. (2013). How to construct a minimal theory of mind. Mind & Language, 28(5), 606–637.
    Butterfill, S. A., & Apperly, I. A. (2016). Is goal ascription possible in minimal mindreading? Psychological Review, 123(2), 228–233.
    Carey, S. (2009). The origin of concepts. New York: Oxford University Press.
    Gopnik, A. M., & Meltzoff, A. N. (1997). Words, thoughts, and theories. Cambridge, MA: MIT Press.
    Gopnik, A., & Wellman, H. M. (2012). Reconstructing constructivism: causal models, Bayesian learning mechanisms, and the theory theory. Psychological Bulletin, 138(6), 1085–1108.
    Herschbach, M. (2015). Direct social perception and dual process theories of mindreading. Consciousness and Cognition, 36, 483–497.
    Hutto, D., Herschbach, M., & Southgate, V. (2011). Editorial: Social Cognition: Mindreading and Alternatives. Review of Philosophy and Psychology, 2(3), 375–395.
    Lurz, R. W., & Krachun, C. (2011). How could we know whether nonhuman primates understand others’ internal goals and intentions? Solving Povinelli’s problem. Review of Philosophy and Psychology, 2(3), 449–481.
    Michael, J., & Christensen, W. (2016). Flexible goal attribution in early mindreading. Psychological Review, 123(2), 219–227.
    Michael, J., & De Bruin, L. (2015). How direct is social perception? Consciousness and Cognition, 36, 373–375.
    Scholl, B. J., & Gao, T. (2013). Perceiving animacy and intentionality: Visual processing or higher-level judgment? In M. D. Rutherford & V. A. Kuhlmeier (Eds.), Social perception: Detection and interpretation of animacy, agency, and intention (pp. 197–229). Cambridge, MA: MIT Press.
    Southgate, V., Johnson, M. H., & Csibra, G. (2008). Infants attribute goals even to biomechanically impossible actions. Cognition, 107(3), 1059–1069.
    Spelke, E. S. (2000). Core knowledge. American Psychologist, 55(11), 1233–1243.
    Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences, 28(5), 675–690.
    Westra, E. (in press). Spontaneous mindreading: a problem for the two-systems account. Synthese.

  3. On many points, Clarke’s arguments are quite convincing. Our leading theories of goal ascription do appear to assume that humans engage in some kind of an inference process when predicting behavior. And there is reason to reject this assumption. However, it isn’t clear to me that the premise of modularity is needed to make this argument—or that the evidence provided supports the modularity assumption.

    One thing to recall at the outset is that the intellectualist sounding “goal ascription” is identified as an act of ascribing an outcome, observable in the macro world, not a mental state that is only identifiable via interpretation or via some technology we have not yet invented. Thus, goal ascription doesn’t require mindreading. Is goal ascription just predicting behavior—does a goal-ascriber know what someone is going to do? If so, the term “ascription” may be misleading. What is ascription? We’re not told, but as I understand it, ascribing something like a goal or a belief to another requires explicitly believing that someone has a goal, or the ability to report that another has a goal. But what Clarke might be more interested in is the ability to anticipate others’ actions so as to coordinate behavior with them. In that case, we have goal ascription abilities widely distributed among species, and if that is the case, he opens himself up to a world of additional evidence that could be used to bolster the case that such processes are not always underwritten by abductive inferential reasoning. Furthermore, in some cases Clarke describes his target as social understanding. Goal ascription, or behavior anticipation, are both only a small part of social understanding. Understanding also involves explanation, which (as I’ve argued) is no more symmetrical in the social domain as it is in the scientific one.

    The arguments that we predict behavior quickly, and use inaccessible cues when doing so were compelling, but less so was the argument that we do so via encapsulated processes (as Clarke himself admits). This already leads us away from the need to see these capacities via the lens of modularity, and instead, perhaps, it would be enough to draw the analogy between behavior anticipation and perceptual processes or language processing. That argument is made in Clarkes’ attempt to respond to the critics.

    The critic’s argument goes something like this: Because we know some reasoning about future action is done abductively, without evidence to the contrary we should conclude that all reasoning about future action is done abductively. This is a weak argument, but nonetheless it appears to have been made. Clarke has already undermined it by providing some reason for thinking that there may be non-abductive reasoning about action, but in this section he offers a more robust argument by drawing analogies between goal ascription/prediction and speech perception.

    Social understanding/goal ascription/behavior prediction and speech comprehension rest on the same sort of processes, Clarke suggests, namely chunking behavior into meaningful parts that can be analyzed. In the case of speech comprehension, the chunks are phonemes, and in the case of social understanding, the chunks are body movements. To someone who has first heard this suggestion, it may be difficult to recognize that in both situations humans are picking out saliences and putting them together is meaningful ways. This may be made clearer in some language processing cases. For example, consider entering a foreign human environment surrounded by people speaking a completely unfamiliar language. You might easily know that there is meaning being provided by these verbal behaviors without knowing where one word starts and and other word ends. You can’t identify the chunks.

    We rarely experience the parallel case with behavioral chunks, since behavioral chunks are more species general than are languages. Reaching to be picked up, looking toward a speaker, walking toward someone one wants to engage with, etc. are universal behaviors that allow the few idiomatic gestures to be noticed and often to be made sense of from within that larger shared context (e.g. even if you point with your finger, understanding someone who points with their eyes or chin may be easy to understand when you know your communicative partner and you are in a context related to a distal object).

    So, seeing that we parse bodily movement into meaningful chunks may be harder than seeing that we parse vocal movement into meaningful chunks. The parallel is better made when looking at other species, whose behavioral repertoires are often so different from our own. Ethologists who study other species first have to learn to see—to learn the meaningful chunks, and to begin to grasp their meaning. What is salient to a new observer is very different from what becomes salient to an experienced scientist who knows the species well.

    Clarke is convincing when he draws the parallels between hearing speaker meaning and seeing actor meaning. However, I think it plausible that in both cases our skills in understanding derive from an experience with patterns that we use to raise and lower probabilities of the plausible next chunks in the sequence. And, the parallel might suggest that the importance of top down effects are greater than what Clarke has in mind. Indeed, the bottom-up processing that rests on statistical regularities and the top-down associations are not clearly differentiated—it may be that the associations are what create the expectation of statistical regularities, and thus the distinction may not be a useful one. That story also appears to be consistent with the data Clarke reports.

    Thus, it requires another argument to establish some form of modularity of speech comprehension and social understanding from the claim that both skills rely on the same kind of process. Clarke’s claim “… if goal ascriptions and speech perception are underpinned by common neural mechanisms it would be unsurprising to discover that they were alike with regards to their cognitive underpinnings (e.g. their modularity).” (14) But since he assumed modularity from the get go, the analogy doesn’t help bolster that assumption. If anything, knowing that two different skills utilize the same kind of mechanism might just as well be used in an argument against modularity.

    The arguments against the inferential nature of goal prediction (“ascription”, if you must) stand on their own, and are only weakened by the appeal to modularity.

  4. I greatly enjoyed reading Sam Clarke’s paper. I share the view that some goal ascriptions might be underpinned by a modular mechanism. My comments will be focused on trying to clarify Clarke’s route to that conclusion. First, I will try to defend a possibility, which he quickly dismisses, that goal ascriptions might be underpinned by a non-Fodorian module, specially dedicated to the domain of (some) goal directed actions. Then I will raise some questions of clarification about the central proposal.

    Clarke’s paper suggests that there are two alternatives for an automatic goal ascription mechanism: either input modules or abductive reasoning in central cognition. Yet a third option exists – the presence of a non-Fodorian module, for example as postulated by the proponents of massive modularity. One way to distinguish between Fodorian modules and ‘massive’ modules is the following: an (important) necessary condition for being a Fodorian module is informational encapsulation, while an (important) necessary condition for being a massive module is domain specificity (and a certain evolutionary path (cf. Barkow, Tooby and Cosmides 2003). An example of the latter sort of module is the cheater detection module, which is specially dedicated to reasoning about cheaters in social, and not in non social, contexts. Infants’ core systems (Carey, 2009) might be another example of modular mechanisms that are neither Fodorian nor part of central processing.

    The first part of Clarke’s paper can be more accurately understood as arguing against two sub-theses:
    1. Goal ascriptions are not the result of any kind of abductive reasoning (within or outside a module).
    2. Goal ascriptions are not the result of abductive reasoning performed by central processing.

    I think that the paper is more successful in arguing against sub-thesis 2 rather than sub-thesis 1. We can hypothesize that some goal ascriptions are the outputs of a massive module, whose domain is constituted by visible goal directed actions and whose computational abductive rule is the ‘principle of efficiency’ (without the need for full-blown, ‘Sherlock Holmes’-style, abduction). This latter computational rule needs not be represented by the subject, but is discovered by the cognitive scientist. When Gergely and Csibra write about ‘natural pedagogy’, for instance, they sometimes hint at the view that natural pedagogy is such a module (Csibra & Gergely 2009, though this may not be their view regarding the teleological stance).

    It is not obvious that speed and accessibility, two properties of the goal ascription mechanism which Clarke emphasizes, are properties of Fodorian modules only. Let us focus on accessibility. It seems to me that we can resist the claim that the implicit bias case is different in access from the kinematic cues case. Indeed, there is an ambiguity concerning what is inaccessible: in the case of implicit bias the information about using gender to make decisions is not accessed by introspection, but via third person delivery of information and indirect measures (the IAT) that reveal the bias by way of reaction times. Thus, some clarification about the scope of inaccessibility is needed: is it just the informational content, the way the content is detected by the subject, or the rules (such as computational rules) on the basis of which the information is treated?

    Informational encapsulation would provide the strongest argument for goal ascriptions being a Fodorian module, but as Clarke himself remarks experimental evidence strictly similar to the Müller-Lyer illusion is still lacking. A step in that direction could be constituted by the experiments by Gao and Scholl, but I think there might be an issue in appealing to them here. These studies are primary concerned with visual perception, and not with goal attribution strictu sensu.

    This brings me to a first clarification question. What does Clarke mean by ‘goal ascription’? Scholl and Gao’s experiments might be taken to signal the presence of goal-directed action perception (similar to perception of colors, shapes, contact causation, etc.), but not goal ascription. In the case of shape perception, for example, one could visually experience the shape of an object without thereby categorizing the shape as a shape or as a certain shape (triangle). Similarly one could visually experience the goal and the action towards it, without categorizing the visual percept as a goal or as a goal-directed action.

    A second, related, clarification question is about the notion of goal at stake. It seems to me that on page 17 we find an ambiguity between: goal as an internal mental state and goal as an external and visible end-point of an action. It is possible that people thinking that goal ascriptions need central processing are concerned with the former, while Clarke only focuses on the latter.

    Finally, a couple of remarks about the idea that the goal ascription module works as categorical perception parsing the visual stimulus at goal boundaries and underpinned by mirror systems. Clarke suggests that to do this, kinematic idiosyncrasies are abstracted from, and this explains the non biological grasping experiment and the attribution of goals to geometrical figures. Yet this might be inconsistent with another property of mirroring systems: they attribute some goals only when the action is (in some respect) similar to the viewer’s motor repertoire (when the goal was communication, the mirroring system activates when seeing a human talking and a monkey lip smacking, but not a dog barking, Buccino et al. 2004).

    Once again, I think that there are three sub-theses that could be independently defended:
    1. Perception of visual goals is a case of categorical perception that parse categories at goal boundaries, similarly to speech perception (and maybe to color and emotion perception?).
    2. This is neurologically underpinned by mirror systems.
    3. This constitutes automatic and early goal attributions.

    Clarke’s proposal opens up exciting avenues both for philosophers and for experimental research (for example, if goal ascriptions are modular one should expect damages and selective breakdowns as in speech perception) and I am looking forward to the discussion


    Barkow, J.; Cosmides, L. & Tooby, J. (eds.) (1992). The Adapted Mind: Evolutionary Psychology and the Generation of Culture. Oxford University Press.
    Buccino, G., Lui, F., Canessa, N., Patteri, I., Lagravinese, G., Benuzzi, F., … & Rizzolatti, G. (2004). Neural circuits involved in the recognition of actions performed by nonconspecifics: An fMRI study. Journal of cognitive neuroscience, 16(1), 114-126.
    Carey, S. (2009). The Origin of Concepts. Oxford University Press.
    Csibra, G. & Gergely, G. (2009). Natural pedagogy. Trends in Cognitive Sciences 13 (4):148-153.

  5. I want to thank Kristen Andrews, Mitchell Herschbach and Joulia Smortchkova for taking the time to read and engage with my paper—it’s been very useful for me! For the sake of brevity, I’m going to focus on responding to the concerns they raised with my paper. Unfortunately, this means leaving out a discussion of Herschbach’s helpful remarks regarding the situation of my paper in the wider literature and the brilliant parallels that Andrews drew between the parsing of speech in a foreign language and the chunking of observed animal behaviour.

    As I see things, my commentators raised three broad (and somewhat overlapping) worries. These were:
    1) A concern with my appeal to the modularity of the systems in question
    2) A concern with what was meant by a ‘goal ascription’
    3) A concern with the evidence cited for the informational encapsulation of the systems involved in making certain goal ascriptions
    Although my commentators developed these concerns in different (and sometimes incompatible) ways, there was, I think, a common thread to each, so I propose to consider these in turn.

    The Modularity of Mind? (KA, JS)
    My paper argues that we should take seriously the idea that some goal ascriptions are performed by an input system (or input systems) akin to that involved in the parsing and categorisation of speech. At the beginning of §2 I tried to say something that would prove relevant to the assessment of this claim. Namely, that the input systems of interest might be usefully thought of as modular (in essentially Fodor’s sense of the term). The nice thing about this suggestion is that it makes testable predictions, since modules of this sort are said to display a number of interesting properties to an interesting extent. This enabled me to begin the discussion by asking: ‘do the systems involved in making (certain) goal ascriptions manifest these characteristic properties to an interesting extent?’. If they were found to, then this would seem to provide some evidence for taking my suggestion seriously.

    Of course, as I noted in the paper, proposing that speech perception (or any other kind of cognitive process) is underpinned by a modular system remains a controversial suggestion. And this worry came out in both Andrews’ and Smortchkova’s commentaries, albeit in quite different ways. While Smortchkova suggested I might weaken the notion of modularity being appealed to, Andrews suggested that I should jettison the notion altogether.

    As I read her, Andrews’ suggestion was just that an appeal to modularity was not needed to make my point. That is, it could be true to say that humans possess an input system, akin to those involved in speech perception, which identifies the goals of certain actions, irrespective of claims about the modularity of these systems. On this point, I completely agree! In fact, I tried to acknowledge this point when I wrote:
    I happen to endorse an essentially Fodorian picture of the above sort. That being said, it is important to acknowledge that the details of Fodor’s purported distinction between modular and non-modular systems are controversial, if only to note that much of this controversy is irrelevant for our purposes.
    Why did I claim that this controversy was irrelevant for our purposes? Because, as I went on to discuss, whether or not the input systems in question deserve to be called modular in Fodor’s sense of the term, it must be true to say that they manifest the properties that Fodor takes to be distinctive of modular systems to some interesting extent when compared with the other kinds of cognitive systems that are typically seen to be involved in making goal ascriptions. For instance, even if Fodor was wrong to say that speech perception is completely cognitively impenetrable, it would still be true to say that it is relatively encapsulated, hence the apparent judgement independence of various linguistic illusions. So, in this way, my appeal to modularity was just meant to be shorthand for a system that manifests the properties that are distinctive of the input systems to some striking extent. That being said, Andrews’ point highlights the fact that this could be made clearer if I simply referred to these as ‘input systems’ throughout and avoided the term ‘module’ altogether. [Although, feel free to email me for a draft of a paper offering an extended defence of Fodorian modularity (cue the tumbleweed).]

    Smortchkova’s concern with my appeal to modularity was slightly different to Andrews’. Her point was that it might be more plausible to suppose that humans possess a modular system for goal ascription if we think of it as modular in the weaker sense that is appealed to by proponents of massive modularity. I agree that this would be more plausible—indeed, it couldn’t fail to be more plausible since it would amount to a related, but much weaker, claim. After all, proponents of massive modularity typically only want to suggest that the modules they are talking about display some of the properties distinctive of Fodorian modules to some interesting extent. However, for this reason, an initial worry with this suggestion would be that it risks triviality. After all, proponents of massive modularity typically introduce their weaker notion of modularity since it purports to make plausible the idea that (more or less) all cognitive systems are modular (albeit, in this weaker, and quite different, sense from Fodor). So, if a massive modularity of this sort were true, and if one were to agree that humans make goal ascriptions (something that seems undeniable), then it would just seem to follow that these (and any other cognitive achievements that we care to mention) get carried out by some modular system or another, in this weakened sense.

    Perhaps then Smortchkova has something different in mind. Perhaps she is suggesting that while massive modularity is ultimately false (i.e. that the entire mind is not made up of modular systems, even in this weakened sense) some cognitive systems are usefully thought of as modular in the way that massive modularists suppose all cognitive systems to be. If this were so, then it would remain a substantive suggestion to say that one of these modular systems was responsible for making some goal ascriptions, even if it were only modular in the weakened sense that massive modularists have in mind. As I see things, this might pan out in various ways. However, as should now be clear, the important point that I would want to maintain in the face of this suggestion, is just that some goal ascriptions are plausibly carried out by a cognitive system that is modular in a sense that is distinctive of speech and sensory perception and which marks these systems out from those involved in rational abduction.

    Reading between the lines, it seems to be this that Smortchkova is sceptical of. Bracketing evidence for the encapsulation of my hypothesised system (which I will return to below), she seems to deny that any of the other evidence I cite speaks in favour of my conclusion as opposed to her weaker suggestion. For instance, she denies that the apparent speed and inaccessibility of certain goal ascriptions rules out the possibility that the system responsible is simply modular in the massive modularist’s weakened sense and that it is thereby silent on the system’s being speech-perception-like, or input-system-like, more generally. I disagree. There are two ways in which speed might evince the modularity of a system in this sense. It might just be that input modules happen to be very speedy in their operations, such that if you give one an input it produces an output quicker than other kinds of system. Alternatively, it might be that the speed of a cognitive identification process can be indicative of its being performed by an input system because the operation of an input system ought to be quicker than the operations of an input system followed by that of a non-input system, all things being equal. Now, if I was simply appealing to speed in the first way, then it might be reasonable to suppose that a speedy system of this sort could be located anywhere in the human mind, not simply at its periphery. However, given that the identification of goals relies on our perception of agents, I took myself to be appealing to speed in the second way, and thereby evincing the idea that an input module/system is involved. This does not amount to a knock down argument in favour of the system’s being an input system, but it is one piece of suggestive evidence. Leaving aside the reasons I provide for drawing a parallel between speech perception and (certain) goal ascriptions discussed in §3, I’m inclined to think that similar points apply to the apparent inaccessibility of the information used to make certain goal ascriptions although I acknowledge that this is a trickier issue and one that I need to give a bit more thought to.

    The ‘What?’ and ‘Why?’ of Goal Ascription (KA, MH, JS)
    Andrews and Smortchkova also offered useful comments concerning the notion of “ascription” that was in play in my discussion of goal ascription. As Andrews rightly observed, the word ‘ascription’ is rather “intellectualist sounding”. As she puts it:
    ascribing something like a goal… to another requires explicitly believing that someone has a goal, or the ability to report that another has a goal. But what Clarke might be more interested in is the ability to anticipate others’ actions so as to coordinate behavior with them.
    The first point Andrews makes here is well taken. I certainly do not wish to suggest that ‘goal ascription’ in the sense under discussion actually necessitates belief or an ability to report in the subject. Indeed, it couldn’t if the system really is input-like. After all, the operations and outputs of input systems are (I take it) strikingly belief independent (e.g. the illusion persists even when I know the McGurk effect is illusory and I explicitly reflect on this fact). That being said, I don’t think that I am simply talking about the anticipation or prediction of goals here either. This implies something like a capacity to pre-empt these before they are realised. While this would be an important function of the hypothesised system, I take it to be equally important that it be able to identify goals if and when these are realised, regardless of whether these were actually predicted in advance (a point that comes out in my discussion of behaviour parsing in §3). With this point in mind, it might be more accurate to say that the system ‘tracks’ the goals of observed actions and that this enables certain forms of coordinated behaviours, irrespective of whether this enables the subject to think about these as such. In this way, it may be analogous to certain kinds of perceptual attributive, such as those that Smortchkova discusses.

    What about the notion of a ‘goal’? Smortchkova finds an ambiguity in the way I use the word between that of an internal mental state and that of an outcome, out there in the world. I’ve not managed to spot this ambiguity myself, but if this is the case then it too is unintentional. As I am using the word ‘goal,’ I am strictly talking about outcomes, out there in the world, rather than internal mental states. This is, I take it, is precisely the same way that opponents, like Gergely and Csibra, use the word when formulating their rival views.

    Of course, as Herschbach notes, not everyone is as clear as Gergely and Csibra in this respect. Indeed, many who have drawn on their work, conflate the notion of a goal as a mental state with it as an outcome. So, why care about goal ascription as outcome tracking in the first place? One reason is that even if all you care about is mindreading, a lot of mindreading will probably depend upon a prior goal (outcome) tracking ability. For instance, in order to identify and subsequently think about someone’s intention to do X on the basis of their observed behaviour, it seems inevitable that you, or one/some of your cognitive sub-systems will need to first identify the outcome to which the observed action was directed such that it can then be considered as the content of a mental state. If this is correct, then goal ascription (thus construed) is of central importance to theorists who invoke a richer notion of ‘goal’ in their work, irrespective of whether this or any other mental ascriptions are possible in early development.

    Investigating Informational Encapsulation (MH)
    All three commentators, but particularly Herschbach, pressed me on the evidence for my claim that the systems involved in making certain goal ascriptions might be encapsulated. While this suggestion was made tentatively in my paper, it is an important concern, since relative encapsulation is sometimes taken to be the most important or distinctive characteristic of the input systems. (When I say this I need not assume a Fodorian picture since I take it that characteristics like ‘judgement independence’ are widely appealed to when assessing the plausibility that things are attributed by input, as opposed to central, systems, regardless of other background commitments.)

    For Herschbach, a principle concern was with the dearth of evidence for encapsulated goal ascriptions in adults. This was a useful observation. As Herschbach rightly notes, if some (but only some) goal ascriptions are made by a fast and efficient system akin to Apperly and Butterfill’s system 1, then we ought to find evidence for its continued functioning (and limitations) throughout the lifespan of normally developing humans. Consequently, the small amount of evidence cited for the encapsulation of certain goal ascriptions was not only tentative but incomplete—it simply highlighted young infants’ seeming inability to take on board certain salient information when ascribing goals to an agent. Worse still, Herschbach worried that it would seem unlikely that any goal ascriptions made by normal human adults will be found to display the signature limits of the infant goal ascriptions I discuss (ascriptions that were insensitive to certain biomechanical constraints on action). What might I say in response?

    Firstly, I would like to backtrack a little on my suggestion that automaticity is irrelevant in this context. I say this because the automaticity of a system does seem to suggest (if not demonstrate) some level of encapsulation—it suggests that the system in question operates somewhat independently of the subject’s wants and intentions. If this is the case (and I realise it won’t be uncontroversial) then there does seem to be some suggestive evidence for encapsulated goal tracking in adults since there is evidence that humans cannot help but ascribe goals to the actions of observed agents in certain settings (for instance, see Gao and Scholl, 2013, cited in the target article).

    Second, I do not wish to propose that all goal ascriptions are carried out by input systems. This would, I take it, be implausible since we surely can and do perform rational inferences about the goals of actions, at least some of the time. I doubt that Herschbach was suggesting otherwise, but this point bears emphasis. With this point clarified, his worry might be understood as ‘surely no goal ascriptions in adults are going to be limited in the way that infants’ are’. If this is the case, then there appear to be two concerns one might have with my paper, one theoretical and one methodological. The theoretical concern would be: if the system responsible for early goal ascriptions does become more sophisticated well into adulthood, then it cannot be a module/input system in the suggested sense. The methodological concern would be: if the system does become more sophisticated in this way, then what signature limits should we even be looking for in certain (fast and subconscious) goal ascriptions made by adults?

    In response to both concerns, my hypothesis would be that the system(s) responsible for early goal ascriptions will become increasingly sophisticated during the lifespan but will still draw on the same type of information throughout due to systematic limitations in informational accessibility. For instance, suppose (if just for the sake of argument) that the infants that were able to anticipate the target of a reaching action in Ansuini et al.’s (2013) study were doing so on the basis of an encapsulated system, that took into account relatively crude information about the agent’s grip aperture and wrist velocity/direction. My prediction would be that while adults are certainly able to use this type of information in far more sophisticated ways than the infants and are able to use it to track far more sophisticated goals than young infants (as seen in the work of Cristina Becchio and her colleagues), these abilities will be similarly insensitive to things known or believed by the subject (e.g. beliefs about the agent’s beliefs and interests). This strikes me as an open possibility that is worth taking seriously and one that could be explored in future work.

    No doubt there is more to be said on all of these points and I am very much looking forward to the discussion.

    1. I really enjoyed your reply Sam! Sorry for being so late to reply, but I wanted to at least add some thoughts before the end of the session.
      I found really interesting how you responded to the comments about your criteria for the existence of an input system: speed, accessibility and encapsulation–especially your new appeal to automaticity as an indicator of encapsulation. These issues of cognitive architecture are obviously controversial, but here are a few thoughts, that mostly spur from my reading of Evan Westra’s recent work on spontaneous mindreading. I tend to agree with you that automaticity is a solid indicator of encapsulation. But as the mindreading literature reviewed by Westra shows, it’s tricky to establish the automaticity of a psychological process, and need to be careful to distinguish genuinely automatic processes (i.e., mandatory, stimulus- driven, and goal-independent) from ones which are merely spontaneous processes, which are similarly fast and cognitively efficient, but more context sensitive than automatic processes. He treats this category of spontaneous process as occupying a middle ground between fast-yet-inflexible and flexible- yet-slow processes. If that way of demarcating cognitive processes makes sense, inflexibility/mandatoriness would seem to be a main indicator of encapsulation for your view–and speed would drop out as not a reliable indicator of an input system.

Comments are closed.