Featured Video Play Icon

Information and Metacognition

Miguel Ángel Sebastián (Universidad Nacional Autónoma de México)

Marc Artiga (Universitat de Barcelona)0

[PDF of Sebastián & Artiga’s paper]

[Jump to Rosa Cao’s comment]

[Jump to Dan Ryder’s comment]

[Jump to Sebastián & Artiga’s response]


Representations are daily postulated in mainstream neuroscience. Nonetheless, few have actually tried to offer a general theory of representation based on those practices. Recently, some approaches have attempted to develop this idea and defend that the relation of representation can be explained in purely informational terms. In this paper we argue that such informational theories cannot provide a satisfactory account of the relation of representation. In particular, we will show that they cannot accommodate the existence of metarepresentations, which play a central role in the explanation of certain cognitive abilities.

1  Introduction: Representation

Cognitive science attempts to explain how our cognitive system works. Although over the years there have been different ways of approaching this question, the mainstream view in still maintains that our mind is a representational system. Adherents of this view claim that the best way to explain cognition is to posit the construction of internal representations. Thus, to understand current practice in cognitive science, we need to get a better grasp of the nature of these entities: we need a theory of representational content.

In philosophy, this has traditionally been known as the problem of intentionality. And, although cognitive scientists in general, and neuroscientists in particular, do not usually address this problem directly, they seem to implicitly assume a set of intuitive conditions that are sufficient — or even necessary — for a state to qualify as a representation and to possess a determined representational content. In this paper we would like to examine recent attempts to turn this intuitive methodology into a full-blown naturalistic theory of representation. As we will see, these approaches heavily rely on the idea that information, understood as some form of statistical dependence, is the clue to understand representations. Of course, the idea that we can explain representations by appealing to some sort of information is not new, and can be traced back at least to Dretske (1981). Nonetheless, recent approaches are appealing for at least two reasons. First, they seem to solve the main difficulties faced by Dretske’s informational theory. Secondly, and even more interestingly, they seem to capture the intuitive criteria employed by neuroscientist when they claim, for instance, that certain neuronal activation in a determinate cortical area represents a particular stimulus. Since they achieve these goals by modifying Dretske’s original proposal in different ways, we will call this family of approaches ’Scientifically Guided Informational Theories’ (SGITs). Some form or other of SGIT has been defended by Usher (2001), Eliasmith (2000, 2003), Rupert (1999), Skyrms (2010) or Scarantino (2015).

Interesting as SGITs are, in this paper we argue that this kind of theories lack the resources to make a fundamental distinction that is at the core of many cognitive theories: the difference between those representations that have other representations as their object — i.e. metarepresentations — and those representations that are merely caused by other representations but have external stimuli as their object. Since representations of external stimuli and metarepresentations involve the same kind of relation — namely that of representation — , but play different and indispensable roles in our cognitive architecture, a satisfactory theory of representation needs to make room for such a distinction. If we are right, though, SGITs are unable to make it.

The paper is organized as follows: section 2 presents SGITs and section 3 clarifies the relevance of metarepresentations in our cognitive architecture. In section 4, we develop the idea that SGITs are unable to account for the difference between metarepresentations and representations of external stimuli and we consider some objections. Our argument is supposed to show that content cannot be fully determined solely in terms of statistical dependance relations. In section 5, we briefly discuss whether the notion of teleological function could help these approaches to solve this problem.

2  Informational Theories

Scientifically Guided Informational Theories (SGITs) are naturalistic theories of content. The main goal of these theories is to show how representations fit our scientific worldview. More precisely, they try to explain what it is for a state to be a representation and how its content is determined by appealing to non-representational states and processes. If that project could be carried out successfully, it would provide a solution the classical problem of intentionality. The nature of representational phenomena would be finally understood.

The particular answer SGITs gives to this challenge connects with the long-standing informational tradition. The key feature of Informational theories of content is that they seek to account for representations by resorting to some sort of informational relation.[1] One of the first and better known informational theories of content was Dretske’s (1981), who tried to analyze semantic content by appealing to informational content and defined informational content in terms of probability relations. More precisely, according to his approach a state R carries information about another state S iff given certain background conditions P(S | R) = 1. While the idea of explaining semantic properties in terms of information was revolutionary and very influential, there were two deep problems with Dretske’s proposal. First of all, in the natural world it is extremely difficult to find two different states such that the existence of one of them makes the other state certain (even if certain background conditions are assumed). This consequence made the theory unrealistic. Secondly, this approach was incompatible with one of the defining characteristics of representational states, namely that they can sometimes misrepresent. On Dretske’s approach, a state represents another one only if both obtain, so a typical case of misrepresentation (which usually involves an existing state representing a non-existing one) is rendered impossible.[2] These and other problems lead most people to think that a satisfactory informational theory of content was unworkable.

However, this situation has recently changed and some new informational theories are being put forward by philosophers and psychologists. Of course, these approaches are aware of the problems faced by previous theories in the same tradition, and for this reason define and use the notion of information in a slightly different ways. The main modification is the rejection of the requirement that P(S | R) = 1, which was the key assumption that caused the theory to be too unrealistic and make misrepresentation impossible. However, dropping this assumption is not without costs. In particular, which probability should then be required for a state to represent another state? Any lower standard would seem arbitrary. Furthermore, given the wide range of different representations one can find in the natural world, any arbitrary criterion will probably leave some real representations out and let some non-representational states in. To address these concerns, the strategy pursued by new informational approaches is to appeal to relative probabilities. Accordingly, what is relevant is not how much the representation raises the probability of another state, but whether it raises more the probability of a certain state than the probability of others. This is the central idea that has been developed in various ways by different authors.

Since a joint consideration of all SGIT would be extremely complex, for the sake of simplicity we will focus on a particular approach. Nonetheless, after presenting our objections, we will show how these problems probably extend to other SGITs (see section 4.4). More precisely, here we will concentrate on Usher (2001), because he defends an informational theory based on statistical dependence relations, provides a particularly clear approach and is explicitly motivated by research in cognitive science. Furthermore, his view seems to capture the intuitions expressed by Eliasmith (2000, 2005b, 2005a) and Rupert (1999), among others.

Usher (2001) claims that his account is based on Shannon’s (1948) notion of mutual information. The core idea behind this concept is that a signal X provides information about some random variable Y just in case the presence of the X reduces the uncertainty of Y. In other words, just in case P(Y | X) > P(Y ). Shannon provided a precise mathematical definition of mutual information between two sets, that can be easily extended to calculate the mutual information between two states. In particular, the mutual information between X and Y (expressed as ‘MI (X;Y)’) is defined by the following formula:

MI(X;Y)=\log_{2}{\left(\frac{P(X\cap Y)}{P(X)\, P(Y)}\right)}

Therefore, the mutual information depends on the ratio P(X∩Y)/P(X)P(Y), which is identical to P(X∣Y)/P(X) and to P(Y∣X)/P(Y) (by Bayes’ rule). Recall, however, that one of the central motivations of SGITs is that representational content cannot just be determined by the fact that a the mutual information between two variables reaches a certain threshold (this is the key point of departure from classical informational theories). Following this line of reasoning, Usher’s proposal is that R represents S iff (1) the mutual information R carries about S is greater that the information it carries about any other entity and (2) the mutual information between S and R is greater than the information S carries about any other representation. More precisely:

  1. MI(R_{i};S_{i})=\frac{P(R_{i}\mid S_{i})}{P(R_{i})}>\frac{P(R_{i}\mid S_{j})}{P(R_{i})}=MI(R_{i};S_{j}), for all j ≠ i
  2. MI(R_{i};S_{i})=\frac{P(S_{i}\mid R_{i})}{P(S_{i})}>\frac{P(S_{i}\mid R_{j})}{P(S_{i})}=MI(R_{j};S_{i}), for all j ≠ i

Because of the identical denominator, these expressions can be simplifed in order to provide a more concise definition of Usher·s informational theory:

INFO Ri represents Si iff for all j ≠ i

  1. P(Ri | Si) > P(Ri | Sj)
  2. P(Si | Ri) > P(Si | Rj)

These two conditions are supposed to capture the two dimensions that are relevant for content determination: the backward and forward probabilities. In particular, the first condition claims that, among all entities that increase the probability of R occurring, Si is the one that increases this probability more. That is, the claim is that among all the stimulus eliciting R, Si is the one that is more likely to produce R. This first condition is supposed to single out the stimulus that better correlates with the mental states. In contrast, the second condition compares different representational states. The idea is that R represents Si only if R is the representational state that increases more the probability of Si being the case. Here the probability that matters is the backward probability conditionalized on representational states.

New informational approaches such as INFO have certain features that make them worth considering in detail. For one thing, they seem to solve the two most pressing problems of Dretske’s approach, namely the problem of misrepresentation and its empirical implausibility. First of all, since these theories reject Dretske’s suggestion that the likelihood of the referent given the representation has to be one, they make it possible for a state to represent S when S is not the case. Representational relations are grounded on statistical dependencies between entities, so in a given occasion a representational state might be caused by an entity that it is not in its extension. Secondly, new informational theories are also much more realistic than the previous proposals in this tradition. Indeed, as they argue, this approach might indeed capture the way neuroscientists reason (Usher, 2001, p. 320). For instance, following Hubel and Wiesel’s (1959) methodology, many neuroscientists identify the referent of a neuronal structures in early vision with the stimulus that is more likely to elicit a stronger response. Along the same lines, an additional virtue of these approaches is that they provide a precise method for discovering the content of neural events. They make very determinate predictions about the content of representational states, which is extremely valuable in scientific projects (Eliasmith 2000, p. 71).

For these and other reasons, in recent years scientifically guided informational theories have been gaining prominence (e.g. Pezza and Terenzi, 2007; Rusanen and Lappi. 2012, Scarantino, 2015). In what follows, however, we will like to argue that this optimism is probably unfounded.

3   Representation and Metarepresentation

As we previously mentioned, SGIT are naturalistic theories of mental content, since they attempt to clarify the nature of the relation that holds between a representation and its object. This seems to require, at least, an answer to two questions: i) what is a representation and ii) what is the content of that representation. In this paper we will focus on the second question.[3] Accordingly, we will argue that SGIT fail to provide sufficient conditions for determining representational content. More precisely, we will show that SGIT lack the resources to distinguish between metarepresentations (which have another representation as its object) and representations that reliably correlate with another representation (but which do not have a representation as its object).

To develop our argument, in this paper we will focus on representations we have of our own mental states. These representations are interesting for several reasons. In the first place, we seem to, at least sometimes, know what we think, what we regret, what we perceive, what we fear, etc. These are particular instances of our general ability to represent our own mental representations. A satisfactory naturalistic theory of representation should be able to account for these metarepresentational states.

Secondly, understanding this metarepresentational capacity is not only interesting for its own sake. It is well-established that we usually attribute mental states to others in order to explain their behavior (that is what philosophers call ‘folk psychology’). Furthermore, it is commonly held that a unique mechanism underlies mind-reading (attributing representations to others) and metacognition (attributing representations to oneself) and that both abilities are directly connected (cf. Nichols and Stich 2003). There is, however, a huge controversy on whether metacognition is prior to mindreading — that is, on whether the ability of mindreading depends on the mechanisms that evolved for metacognition — or the other way around. Defenders of the so called ’theory-theory’ (Gazzinaga 1995, 2000; Gopnik 1993; Wilson 2002) argue that when we mindread, we make use of a theory of human behavior known as ’folk psychology’. This theory, just like other folk theories such as folk physics, helps us to master our daily lives successfully. On this view, mindreading is essentially an exercise in theoretical reasoning. When we predict the behavior of others, for example, we make use of folk psychology and reason from representations of the target’s past and present behavior and circumstances, to representations of the target’s future behavior. For theory-theorists, if there is just one mechanism, then metacognition depends on mindreading: metacognition is merely the result of turning our mindreading capacities upon ourselves (for an excellent review of the evidence in favor of the claim that mindreading is prior to metacognition see Carruthers 2009, 2011). On the other hand, defenders of simulation theories of mind like Goldman (2006) suggest that metacognition is prior to mindreading. The attribution of mental states to others, on this view, depends upon our introspective access to our own mental states together with processes of inference and simulation of various sorts, where a simulation is the process of re-enacting or attempt to re-enact, other mental episodes. If metacognition is prior to mindreading, then the latter would also depend on the kind of metarepresentations we are considering. Recently, alternative approaches have also been developed, such as hybrid (Nichols and Stich, 2003) or minimalist theories (Bermudez, 2013).

Finally, the ability to represent our own mental states might also play an important role in consciousness. For example, David Rosenthal (1997, 2005) has defended that conscious states are those one is aware of oneself as being in. This transitivity principle motivates one of the most popular families of theories of consciousness: higher-order representational (HOR) theories.[4] HOR theories explain what it takes for states to be conscious by means of an awareness of that state. If such an awareness is to be unpacked as a form of representation (Kriegel 2009), then consciousness depends on metarepresentation. Although there is plenty of controversy on the nature of the higher-order representation — as on whether whether higher-order states are belief-like (Gennaro 1996, 2012), Rosenthal 1997, 2005) or perception-like (Armstrong 1968, Carruthers 2003, Lycan 1996) —, HOR theories commonly claim that a conscious mental state is the object of a higher-order representation of some kind; i.e. on metarepresentation.[5]

So there are good reasons to postulate and investigate metarepresentations. As a result, a satisfactory theory of mental content should be able to explain what makes a representation the object of another representation. In the next section we want to argue that SGIT lack the resources to do so. In section 5, we will discuss whether this problem can be solved by endorsing a functional account that supplements (or substitutes) these interesting theories.

4  Scientifically-Guided Informational Theories and Metarepresentation

Can SGIT accommodate metarepresentations? The answer we will develop in this section is that probably not. Although for the sake of the argument we will grant that, in many cases, SGIT can account for the difference between being caused by S and representing S (and, in this way, solve a classical problem of previous informational theories such as Dretske’s 1981) we will argue that they are unable to make this distinction in the context of metarepresentations. In other words, the central problem that SGIT faces is that of distinguishing a case in which a state R1represents another representational state R2 from a case in which a representation R1 represents some stimulus but it is regularly caused by another representational state R2. Since R1 and R2 can correlate as good (or as bad) as R1 and R2, and correlations (conditional probabilities) are all the resources SGIT have to explain the differences, these cases pose a serious problem for SGIT. This is the main objection we will develop in this section.

As we said, in our articulation of the objection we will focus on a particular formulation of SGIT — Usher’s proposal (although it important to keep in mind that our argument is supposed to apply much more broadly. See section 4.4). We will argue that INFO cannot distinguish metarepresentations from stimulus representations by considering the two conditionals. First, we will show that if INFO is used to establish that R1 is a representation of a determinate stimulus, INFO could also be employed in order to show that R1 is a metarepresentation of another mental state.[6] Secondly, we will argue that if R1 is a metarepresentation, then the same theory entails that, under certain circumstances, it is rather a representation of an external stimulus.

4.1  From representation to metarepresentation

Consider a red object moving toward a subject S, who is looking at it. S’s brain will generate a visual representation Rrm in highly visual cortical areas. Given the widely accepted principle of functional specialization on which the visual system operates, we know that Rrm requires the existence of other representations. For instance, visual attributes like color and motion are processed by different systems (Livingstone and Hubel 1988, Zeki 1978, Zeki et al. 1991). Whereas color is processed mainly by the blobs of V1, the thin stripes of V2 and the V4-complex, motion is processed by a different pathway that goes from cells of layer 4B in V1 to the thick stripes in V2 and to V5 (Livingstone and Hubel 1988, Shipp 1985, Sincich 2005, Zeki and Shipp 1988). As a result, whenever we possess Rrm we also have two different representations: one of the color of the stimulus, call it “Rr” and one of its motion, “Rm“. Further processing in the visual system results somehow (Bartels and Zeki (2005), Milner (1974), Shadlen and Movshon (1999), Treisman and Gelade (1980)) into a representation that binds both features into a representation of a moving red object, Rrm.

According to INFO, Rrm represents a moving red object because:

  1. Red moving objects are the most likely stimulus that produces Rrm [P(Rrm | red moving object) > P(Rrm | S); for all S ≠ red moving object]
  2. Rrm is the representational state that increases more the probability of there being a red moving object. [P(red moving object | Rrm) > P(red moving object | Rx); for any Rx of the subject such that Rx ≠ Rrm.

For the sake of the argument, let us grant that INFO can satisfactorily exclude other stimuli from the content of the representation. The problem we would like to highlight is that in this scenario INFO will entail that Rrm is a metarepresentation: Rrm represents Rm.

First of all, condition 1 claims that a representational state represents whatever increases more its probability. Yet, at this point, problems begin. As Rm is part of the causal chain that leads to Rrm, we can hardly assume that the presence of a red moving objects increases more the probability of Rrm than the state that represents moving things (Rm) does; that is, it is far from obvious that P(Rrm | red moving object)> P(Rrm | Rm). Given the structure of the visual system, the normal causal path leads from red moving things to Rm , which in turn leads to Rrm. And since moving objects cause Rrm by means of causing Rm, P(Rrm | Rm) is going to be at least as high as P(Rrm | red moving object); in other words, we cannot expect Rrm to carry more information about red moving object than the information it carries about Rm. Therefore, although cases in which there is a red moving object, Rrm is tokened and Rm does not occur are undoubtedly possible, we should expect them to be rare, especially in comparison with cases in which both Rrmand Rm are tokened, but there is no red moving object (something that happens, for instance, every time there is a red object that the system misrepresents as moving). The inequality P(Rrm | red moving object)> P(Rrm | Rm) is satisfied just in case the former situation is more often than the latter, something that does not happen in ordinary conditions — although, as we will discuss in the next subsection, such odd conditions are possible, thereby preventing the possibility of metarepresentation. Thus, the correlation between the final representational state and red moving things should not be expected to be higher than the correlation between the former and the intermediate representation (actually we would expect quite the opposite!). Thus, condition 1 gives us no reason for thinking that Rrm represents a red moving object rather than the mental state Rm.

One might suggest that condition 2 can help to avoid this conclusion, but we think this is unlikely. As we saw, the second condition compares different representational states. It claims that Rrm represents red moving object because there is no other representational state Rx such that it is more probable that there is a red moving object when Rx is activated than when Rrm occurs. Now, since our strategy is to argue that it follows from INFO that Rrm represents Rm, we have to argue that there is no other representational state Rx such that it increases more the probability of Rm than Rrm. There might be many situations in which this might actually be the case. For instance, if most moving things are red, there will probably be no other representation Rx, such that Rx ≠ Rrm and P(Rm | Rx)> P(Rm | Rrm). For example, Rr, which represents red things will not do it, insofar as there are sufficient red things which do not move and so P(Rm | Rr)< P(Rm | Rrm). Thus, one can easily find counterexamples in which this second condition is also satisfied for Rm. Therefore, there are cases in which INFO will confuse a representation of a certain stimulus with a metarepresentation of an intermediate state . This suggests that INFO is an inadequate definition.

Before moving forward, let us briefly consider an objection to our argument. One might grant our point but insist that, in general, condition 2 guarantees that the intuitive result is delivered. For instance, in every environment in which most moving things are not red, it is not true that P(Rm | Rrm) is higher than P(Rm | Rx) for all Rx ≠ Rcm. In particular, if among the moving things there are more green than red items, then P(Rm | Rgm) is higher than P(Rm | Rrm), where stands for a representation of green moving objects. Moreover, in this case P(red moving object |Rrm) > P(red moving object | Rgm), so apparently nothing would prevent Rrm from representing red moving things. Thus, can condition 2 at least help avoiding the conclusion that representations of stimuli are confused with metarepresentations in this restricted set of cases? Unfortunately, we think INFO is unlikely to be satisfying even in this restricted set of cases. We agree that if, as we just considered, most moving things are not red, then condition 2 blocks the possibility that Rrm represents Rm. However, in this scenario the problem simply reproduces for the representation of moving objects of the most common color. Suppose that such color is in fact green. Since ex hypothesi most moving things are green, condition 2 does not prevent Rgm from representing Rm. Avoiding this conclusion would require that there is another representational state, Rx, such that it is more probable that there is a green moving object when Rx is activated than when Rgm is activated. But none of the states involved in the cognitive process we are describing neither the state that represents green, as we have previously seen, nor representations of other color moving object will do. Sure, it is an open possibility that there is still some other mental state not involved in the cognitive process that correlates better with the intermediate representation thereby preventing this result. Nonetheless, whereas this might be the case in some particular case, it is unreasonable to believe that this is going to be the case for every single complex representation as SGITs would require. Likewise, since we cannot assume that P(Rgm | green moving object)> P(Rgm | Rm), there is no reason for thinking that Rgm represents green moving object rather than Rm.

Consequently, even in the restricted set of cases in which INFO can distinguish stimuli representations from metarepresentations due to a particular environmental structure, the same problem will simply reappear at a different location. The rejoinder, then, is probably unsuccessful.

4.2  From metarepresentation to representation

So far, the argument has intended to show that if INFO entails that R1 is a representation of a certain stimulus, the same account could be used in order to show that R1 is a metarepresentation of another mental state. Let us now try to argue for the converse claim, namely that, at least in some cases, if, according to INFO R1 is a metarepresentation, then INFO implies it is a representation of an external stimulus.

Consider now a mental state that represents red things, Rr and a metarepresentational state, MRr that has the former state as its object. Let us start discussing condition 2. It claims that MRr is a metarepresentation of Rr only if MRr is the representational state that increases more the probability of Rr. Here we have to show that this condition can also be satisfied with respect to an external object, i.e. there is also a stimulus S such that MRr is the representational state that increases more its probability.

Consider, for instance, cases in which metarepresentations demand a higher degree of reliability than first-order representations. For example, at least in some circumstances, one might expect that the formation of a metarepresentation (like the belief that I am seeing something red) is more demanding in terms of reliability that what is required to actually have the first-order representation (i.e. to actually see red)t. In circumstances like that, MRr might be the representational state that increases more the probability of a red object being there, because the tokening of the metacognitive state (MRr) requires a higher threshold of reliability than the first-order representation (Rr). For illustration, consider a model according to which metacognition works as a Bayesian filter (Lau and Passingham (2006), Lau (2008)). In this case, MRr is tokened only if the probability that the first-order representation is tokened because it was caused by a red thing is higher than a certain threshold: if P(Rr | red thing) > θ θ being the threshold value. This might depend, for example, on the ring intensity of the neural network which serves as vehicle of representation, thereby avoiding noisy cases. Imagine that such threshold is set under certain circumstances to 0.8. This would mean that the activation of the metarepresentation requires a level of activation of the first-order representation (Rrrequired) that happens with a conditional probability on the stimulus of 0.8 (P(Rrrequired | red object) > 0.8): it is not enough that Rr is tokened but it has to be tokened and have certain intensity. On the other hand, all that is required in this respect for Rr to represent red things is that the conditional probability of the state relative to the stimulus is higher for red things than for any other stimuli. Imagine that the stimulus that more probably activates Rr which is not a red thing, is a pink thing, something that happens 15% of the time: P(Rr | pink object) = 0.15. If the red objects cause the activation of the neural structure more often than pink things — and ex hypothesis more often than any other stimulus — then Rr represents red things at least insofar as condition 1 is regarded. Imagine that this happens 60% of the time: P(Rr | red object) = 0.6. In this case, P(Rr | red object) > P(Rr | S); for all S ≠ red object, which guarantees that condition 1 of INFO is satisfied. However, crucially, P(Rr | red thing) = 0.6 < θ = 0.8, so the metarepresentation is more reliable than the first-order representation concerning the presence of a red object. Accordingly, in these circumstances P(red object | MRr) > P(red object | Rr), so MRr would be the representational state that increases more the probability of red things.

Let’s turn now to condition 1. MRr is a metarepresentation of Rr only if Rr is the stimulus that is most likely to produce MRr, i.e. P(MRr | Rr) > P(MRr | Rx), for all Rx ≠ Rr. To put this inequality into question we need to argue that if Rr is regularly caused by red stimuli, P(MRr | red thing) is at least as high as P(MRr | Rr). That would show that, if the first condition of INFO when applied to assess the content of MRr is satis ed by Rr, there will probably a particular stimulus, red thing in our case, that also fulfills it.

However, as we argued in the previous subsection, this is hardly plausible. At least in ordinary circumstances, states tend to carry more information about their proximal causes than about their distal causes. The reason is quite simple indeed: the visual system sometimes makes mistakes. In some cases, Rr is tokened when there is no red thing around and in those cases the covariation between MRr and red things also fails. However, in other cases Rr is tokened in the presence of a red thing and MRr fails to be activated. Thus, we cannot expect MRr to generally carry more information about red objects — the distal cause — than the one it carries about Rrthe proximal cause — and, as a result, the default assumption should be that P(MRr| Rr> P(MRr | red thing). Ironically, the main problem of Dretske·s account (the possibility of misrepresentation) seems to come to the rescue of informational theories.

Unfortunately, however, the mere appeal to errors is unable to provide a satisfactory solution. In a nutshell, the problem of this suggestion we would like to highlight is that mistakes can also go in other directions. More precisely, the following three conditions might obtain: (1) MRr is tokened, (2) there is a red moving thing and (3) there is no Rr. As a consequence, misrepresentation can decrease the correlation between MRr and red moving things, but it can also decrease the correlation between MRr and Rr. That show that in some circumstances it might be the case P(MRr | Rr) < P(MRr | red thing). Thus, in these situations condition 1 cannot establish that MRr is a metarepresentation of Rr and not a representation of red things. The following example might help illustrate the idea. Consider two different causal paths leading to the activation of MRr. In the first one, a red thing causes the activation of Rr, which in turn activates under certain circumstances MRr. Imagine that there is another stimulus, S, which can also cause the activation of MRr. Call this second path ‘the deviant path’. Clearly, MRr does not represent S, because P(MRr | Rr) > P(MRr | S) — this is why we call it ‘deviant path’. Nonetheless, under certain plausible environmental conditions, this deviant path might cause certain troubles. In particular, imagine that there is a strong correlation between Ss and red things in the environment. In this circumstances, cases in which Rr misses its target — and hence it is not tokened despite there being a red object — might be cases in which nonetheless MRr is tokened due to the deviant path. As a consequence, we would expect P(MRr | red thing) > P(MRr | Rr). This is a simple example in which, according to info, MRr would represent red things.

In reply, one might bite the bullet and claim, as the theory predicts, in these cases MRr is not a metarepresentational state, but a first-order representation of red things. The problem with this suggestion is that the high correlation between S and red things is a contingent fact of a particular environment and, accordingly, it would be unreasonable to maintain that under such circumstances the organism fails to have the required metacognitive states. To make the point more pressing, suppose that the metacognitive state is that belief that I am seeing something red; if the previous argument is on the right track, Info would entail that there are environments in which I cannot form such a belief, because of a certain correlation of stimuli. Moreover, consider a HOR of consciousness, like the one described in section 3. According to it, undergoing a conscious experience depends upon a higher-order representation  — that one is seeing red in the case of an experience as of red. Info when combined with a HOR theory of consciousness has the undesired consequence that in certain environments — one in which there is sufficiently high correlation between Ss and red things — the organism fails to have experiences as of red. This is extremely implausible.

At this point, a remark is required. Certainly, our arguments do not show that INFO entails that all metarepresentational states actually represent distal stimuli. This should be obvious, since the arguments in this subsection assume a particular set of additional circumstances (the existence of a deviant path, etc…). Nonetheless, this fact does not diminish the force of our arguments. INFO (and, in general, SGIT) seeks to provide general conditions for a mental state to possess a determined representational content. To support the view that these theories are unsuccessful, one need not show that it delivers the wrong results in all cases. The fact that it has unintuitive consequences in some clear cases and that it makes representational content to depend on certain features that seem irrelevant (such as the contingent correlation between S and red things in the case of deviant paths) should be enough for casting doubt on these approaches.

To sum up, it seems that in an important set of cases, if MRr is a metarepresentation of Rr, then it will follow from INFO that MRr is a representation of a red object. Furthermore, since in the previous section we have shown that the reverse conditional also holds, we conclude that INFO cannot adequately distinguish representations of external objects from metarepresentations.

4.3  A Rejoinder

Anticipating a similar objection, Eliasmith (2005a) remarks that “In general, statistical dependencies are too weak to properly underwrite a theory of content on their own […] because the highest dependency of any given vehicle is probably with another vehicle that transfers energy to it, not with something in the external world” (p. 1046). In an attempt to address this issue, he includes an additional condition that should allow INFO to exclude other neuronal states as referents. In particular, he adds that the referent cannot fall under the computational description, that is, there must not be any internal computational description relating the referent with the mental state such that it could account for the statistical dependence. Thus, according to him:

The referent of a vehicle is the set of causes that has the highest statistical dependence with the neural responses under all stimulus conditions and does not fall under the computational description. (Eliasmith 2005a, p. 1047; Eliasmith 2000 p. 59-60; emphasis added)

Where the computational description refers to the account of neural functioning provided by the theory of neural representation (p.1047). For instance, activity in V1 has a high statistical dependence with activity in the thalamus, but the reason is that they are computationally related. With this additional clause, the latter can be ruled out as possible content.

Now, it is unclear to us what independent consideration can justify what seems to be a clear adhoc movement. But let us grant for the sake of the argument that there is some independent way of motivating this new condition. At first glance, one might think that it can solve the problem we were dealing with: despite the fact that a red moving object does not rise the probability of Rrm more than Rm, Rrm represents the former because Rrm falls under the computation description, since it is a component of the system. However, there are at least two compelling reasons why his proposal is unlikely to succeed.

First of all, note that computations are defined over representations. To know whether two causally related brain states are computationally related, one should know whether they are representations and how their content is related. Yet this is precisely what this condition is supposed to establish. The requirement that only entities that do not fall under the computational description can qualify as representational objects is of no use in a theory of representational content, because we need such a theory in order to determine which entities should be excluded. Put in a different way: a theory that presupposes the representational content of certain states cannot in turn be used to deliver these contents.

The second problem with this suggestion is that it seems to exclude too much, because we do indeed have some representations of our own neural states (which, arguably, also fall under a computational description). For instance, suppose that Higher-Order Representational (HOR) theories of consciousness are right and we need metarepresentations in order to have an experience as of red. In that case, if S is having an experience as of seeing red, she needs to have a metarepresentation of Rr, most probably in the dorsolateral prefrontal cortex (Lau and Passingham (2006), Lau and Rosenthal (2011)).[7] Call this metarepresentation “MRr“. According to INFO, MRr represents Rr because:

  1. Rr is the most likely stimulus that produces MRr [P(MRr | Rr) > P(MRr | S); for all S distinct from Rr and MRr]
  2. MRr is the representational state that increases more the probability of there being Rr [P(Rr | MRr) > P(Rr | Rx); for all Rx of the subject distinct from MRr and Rr].[8]

But note that, if Eliasmith’s modification of INFO is accepted, this theory would be known to be false a priori, because it would be impossible for a state to represent another neuronal state in that way if both are computationally related. And although we think that the truth of HOR theories is far from established, it would be highly inadequate to exclude such a theory by the mere definition of what representing is. Consequently, we think that Eliasmith’s rejoinder is far from being fully satisfying.

4.4  Generalizing the argument

If the arguments so far have been on the right track, Ushers·s and Eliasmith·s SGIT lack the resources to allow us to say that Rrm represents a red moving object rather than Rm and, at the same time, that MRmr represents Rmr. Moreover, the reasoning developed in the preceding sections suggests that this failure is rooted in the fact that they try to explain content by exclusively appealing to statistical dependence. Thus, mutatis mutandis one should expect the same problem to affect other SGIT that rely on correlations. For instance, consider Skyrms’ theory (which, with slight modifications, is also embraced by Birch, 2014). According to this approach, the informational content of a given representation R is a vector. More precisely, the informational content is a vector which tells us how a signal changes the probabilities of all states. If there are only four possible states of the world (S1, S2, S3, S4), the informational content of a signal should be calculated with the following formula:

<log_{2}\frac{P(S_{1}\mid R)}{P(S_{1})},log_{2}\frac{P(S_{2}\mid R)}{P(S_{2})},log_{2}\frac{P(S_{3}\mid R)}{P(S_{3})},log_{2}\frac{P(S_{4}\mid R)}{P(S_{4})}>

For example, in a given occasion the informational content of a certain signal could be < 1.25−∞−∞0.68 > (the −∞ components are going to end up with probability 0; this is just a side effect of using logarithms). In normal parlance, this signal tells you that the probability of S1 and S4 has been increased and that S2 and S3 are impossible. Thus, this signal represents S1 S4, where the probability of S1 being the case is higher than the probability of S4.

Now, Skyrms does not provide a criterion for choosing the set of states, whose probabilities should be considered in the vector. For instance, do the probabilities of other mental states figure in the relevant vector? Depending on the answer he gives to this problem, Skyrms’ approach seems to face a dilemma. If other mental states are excluded form the vector by de nition, then the theory will face the same problem as Eliasmith’s rejoinder, namely that of a priori excluding metarepresentations. If, on the other hand, the probabilities of other mental states are included in the vector, then representation of external stimuli and metarepresentations should be distinguished by their statistical dependencies, and we previously argued at length that this strategy will probably fail. In particular, we would expect a representation of the external world to have non-zero values for some external states and a metarepresentation to have non-zero values for some neuronal states. But, as we have seen, we have no reason to expect a difference (or, at the very least, a sufficiently significant difference) in the probabilistic vectors that correspond to, say, MRrm and Rm. Consequently, if content is determined by conditional probabilities, we will have no way to distinguish them.

Likewise, other approaches like Rupert’s (1999) or Scarantino’s (2015) do not diverge from Usher’s and Skyrms’ theories in ways that would affect the main point of the paper. For instance, Rupert’s accoun also analyzes representational relations in terms of probability relations between entities, although he only considers forward probabilities (i.e. conditionalized on entities) and restricts his account to representations of natural kinds. On this account, R represents a natural kind S iff members of S are more efficient in their causing R than are members of any other natural kind. However, the arguments we have presented concern entities that can plausibly qualify as natural kinds, so there is not reason for thinking his proposal can overcome the difficulties of other informational approaches.

Summing up, we think that the objections raised here probably generalize to many other Scientifically Guided Informational Theories. Although in previous sections we focused on Usher’s informational theory, we think the problem is likely to affect any approach that seeks to define representational content in correlational terms.

5  Teleological Functions to the Rescue?

If our reasoning is correct, SGIT fail to provide a satisfactory account of representation, because they lack the resources for accommodating cases of metarepresentation. Even though we think that informational relations are likely to be an important element in our understanding how neural structures come to represent, an appeal to statistical dependencies between events is insufficient for providing a fully satisfactory naturalistic theory of content (see also Shea forthcoming). In this final section, we would like to explore some consequences.

Suppose the arguments developed in this essay are right. The first and most obvious solution is to complement SGIT with some other notion. But what else might be required? A popular suggestion is that metarepresentations and representations can be distinguished by appealing to the notion of function. The key idea, of course, is that metarepresentations are states whose function is to indicate other representational states, while other representations have the function to indicate external stimuli. Although there are different ways of spelling out the notion of function (Abrahams 2005, Cummins 1975, Griffiths 1993, Millikan 1989, Mossio et al. 2009, Nanay, 2010), the standard (etiological) view has it that functions should be understood as selected effects, that is, as effects that were important for the selection of the trait. Thus, a particular brain structure (e.g. in the striate cortex) might have been selected for indicating external stimuli, while other structures (e.g. certain areas in the dorsolateral prefrontal cortex) might have been selected for indicating internal states of the organism. Indeed, there are already some proposals which try to combine informational and functional notions (Dretske 1995, Lean 2014, Martinez 2013, Neander 2013, Shea, 2007). So this is an interesting option that needs to be seriously taken into account.

Nonetheless, we would like to conclude by considering a risk. It might happen that adding the notion of function to an informational account has unexpected consequences for SGIT. More precisely, once functions are brought in, the notion of information might be shown to play no important role in the resulting naturalistic theory of content. Although a full discussion of whether information and functional notions can be coherently combined in that way lies beyond the scope of this essay, we would like to briefly argue why we think some tension might exist.

Suppose one is convinced by the arguments laid down in previous sections and accepts that carrying information is insufficient for delivering a satisfactory theory of content. As we just suggested, one could try to simply amend INFO by adding the notion of function. Accordingly, one could claim that the content of a given representational state is determined by the function to carry information about a certain state. That is, one could argue that the function of certain states is to correlate with certain state of a airs. Now, a difficulty with this idea is that the same problem we just saw with informational theories (i.e. that they lack the resources to establish whether a state is a representation of another representational state or the representation of an external stimulus), reappears at the level of function. After all, why should we think that the function of a representation is to carry information about an external stimulus rather than carrying information about another representational state? Just adding the notion of function might not be sufficient for a full answer to this worry.

Of course, this question could be addressed by specifying in more detail what is required for a state or a system to acquire a function. Perhaps an appeal to a specific aspect of the selection process or to the mechanism sending or receiving the signal could help with this problem. However (and this is the central point), if the notion of function can be made specific enough to solve the problem outlined here, it might happen that then the fact that a state has a high statistical dependence becomes largely irrelevant. While carrying information might still be an interesting property of certain states that might help explain why certain features of representational mechanisms evolved, carrying information would not constitute a necessary or a sufficient condition for a state to represent another state. Accordingly, on this approach the utility of the notion of information might be seriously called into question. Indeed, this result could jeopardize the scientific practices if as we granted at the beginning the implicit assumption that neuroscientists are making when establishing claims about the content of neuronal states is to be captured in informational terms.

Obviously, much more should be said in order to make this line of reasoning convincing. Nonetheless, we wanted to briefly call into question the assumption that information will utterly play a role in a satisfactory naturalistic theory of content; something that has not yet been established. At least, we have tried to show that information is unlikely to provide such a theory on its own.



Abrams, M.: 2005, Teleosemantics without natural selection, Biology and Philosophy 20, 97 116.

Amstrong, D.: 1968, A Materialist Theory of the Mind, London: Routledge.

Bartels, A. and Zeki, S.: 2005, The temporal order of binding visual attributes, Vision Research 46(14), 2280 2286.

Bermudez, J. L.: 2013, The domain of folk psychology, in A. O’Hear (ed.), Minds and Persons, Cambridge University Press.

Birch, J.: 2014, Propositional content in signalling systems, Philosophical Studies 171-3, 493 512.

Carruthers, P.: 2003, Phenomenal Consciousness: A Naturalistic Theory, Cambridge University Press.

Carruthers, P.: 2009, How we know our own minds: The relationship between mindreading and metacognition, Behavioral and Brain Sciences 32(2), 121 138.

Carruthers, P.: 2011, The Opacity of Mind: An Integrative Theory of Self-Knowledge, Oxford University Press.

Cummins, R.: 1975, Functional analysis, Journal of Philosophy 72, 741 765.

Dretske, F.: 1981, Knowledge and the Flow of Information, Cambridge: MIT Press.

Dretske, F.: 1995, Naturalizing the Mind, The MIT Press.

Eliasmith, C.: 2000, How neurons mean: A neurocomputational theory of representational content, Unpublished Dissertation, Washington University in St. Louis.

Eliasmith, C.: 2003, Moving beyond metaphors: Understanding the mind for what it is, Journal of Philosophy 10, 131 159.

Eliasmith, C.: 2005a, Neurosemantics an categories, in H. Cohen and C. Lafebvre (eds), Handbook of Categorization in Cognitive Science, Elsevier.

Eliasmith, C.: 2005b, A new perspective on representational problemss, Journal of Cognitive Science 6, 97 123.

et al., J. L.: 2014, Getting the most out of shannon information, Biology and Philosophy 29(3), 395 413.

Floridi, L.: 2010, Information: A Very Short Introduction, Oxford University Press.

Gazzaniga, M.: 1995, Consciousness and the cerebral hemispheres., in M. Gazzaniga (ed.), The Cognitive Neurosciences, MIT Press.

Gazzaniga, M.: 2000, Cerebral specialization and inter-hemispheric communication: does the corpus callosum enable the human condition?, Brain 123, 1293 1326.

Gennaro, R.: 2012, The Consciousness Paradox: Consciousness, Concepts, and Higher-Order Thoughts, MIT Press.

Gennaro, R. J.: 1996, Consciousness and Self-Consciousness: A Defense of the Higher-Order Thought Theory of Consciousness, John Benjamins.

Goldman, A. I.: 2006, Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading, illustrated edition edn, Oxford University Press, USA.

Gopnik, A.: 1993, The illusion of rst-person knowledge of intentionality, Behavioral and Brain Sciences 16, 1 14.

Griffiths, P.: 1993, Functional analysis and proper functions, British Journal for the Philosophy of Science 44(3), 409 422.

Hubel, D. H. and Wiesel, T. N.: 1959, Receptive elds of single neurones in the cat striate cortex, Journal of Physiology 148, 574 59I.

Kriegel, U.: 2009, Subjective Consciousness: A Self-Representational Theory, Oxford University Press, USA.

Lau, H.: 2008, A higher-order bayesian decision theory of perceptual consciousness, Progress in Brain Research 168.

Lau, H. and Passingham, R.: 2006, Relative blindsight in normal observers and the neural correlate of visual consciousness, Proceedings of the National Academy of Science .

Lau, H. and Rosenthal, D.: 2011, Empirical support for higher-order theories of conscious awareness, Trends in Cognitive Sciences 15(8), 365 373.

Livingstone, M. S. and Hubel, D. H.: 1988, Segregation of form, color, movement, and depth: Anatomy, physiology, and perception., Science 240, 740 749.

Lycan, W. G.: 1996, Consciousness and Experience, The MIT Press.

Martinez, M.: 2013, Teleosemantics and indeterminacy, Dialectica 67(4), 427 453.

Millikan, R. G.: 1989, In Defense of Proper Functions, Philosophy of Science 56(2), 288 302.

Milner, P.: 1974, A model for visual shape recognition, Psychological Review 81(6), 521 535.

Mossio, M., Saborido, C. and Moreno, A.: 2009, An organizational account of biological functions, British Journal for the Philosophy of Science 60(4), 813 841.

Nanay, B.: 2010, A modal theory of function, Journal of Philosophy 107(8), 412 431.

Neander, K.: 2013, Toward an informational teleosemantics, in D. Ryder; J.Kingsbury; K. Williford (ed.), Millikan and her critics, Wiley-Blackwell.

Nichols, S. and Stich, S. P.: 2003, Mindreading: An Integrated Account of Pretence, Self-Awareness, and Understanding Other Minds, illustrated edition edn, Oxford University Press, USA.

Pessa, E. and Terenzi, G.: 2007, Semiosis in cognitive systems: a neural approach to the problem of meaning, Mind and Society 6, 189 209.

Rosenthal, D.: 2012, Higher-order awareness, misrepresentation and function, Philosophical Transactions of the Royal Society of London 367, 1424 1438.

Rosenthal, D. M.: 1997, A theory of consciousness, in N. Block, O. J. Flanagan and G. Guzeldere (eds), The Nature of Consciousness, Mit Press.

Rosenthal, D. M.: 2005, Consciousness and mind, Oxford University Press.

Rupert, R.: 1999, The best test theory of extension: First principle(s), Mind and Language 14(3), 321 355.

Rusanen, A. and Lappi, O.: 2012, An information semantic account of scientific models, in H. W. de Regt (ed.), EPSA Philosophy of Science, Springer, pp. 315 327.

Scarantino, A.: 2015, Information as a probabilistic difference maker, Australasian Journal of Philosophy .

Shadlen, M. and Movshon, J.: 1999, Synchrony unbound: a critical evaluation of the temporal binding hypothesis, Neuron 24(1), 67 77.

Shannon, C.: 1948, A mathematical theory of communication, The Bell System Technical Journal 27, 379 423.

Shea, N.: 2007, Consumers Need Information: Supplementing Teleosemantics with an Input Condition, Philosophy and Phenomenological Research 75(2), 404 435.

Shea, N.: forthcoming, Neural signalling of probabilistic vectors, Philosophy of Science .

Shipp, S., . Z. S.: 1985, Segregation of pathways leading from area v2 to areas v4 and v5 of macaque monkey visual cortex., Nature 315, 322 325.

Sincich, L. C., . H. J. C.: 2005, Input to v2 thin stripes arises from v1 cytochrome oxidase patches, Journal of Neuroscience 25(44), 10087 10093.

Skyrms, B.: 2010, Signals: Evolution, Learning, and Information, Oxford University Press, Oxford.

Treisman, A. and Gelade, G.: 1980, A feature-integration theory of attention, Cognitive Psychology 12, 97 136.

Usher, M.: 2001, A statistical referential theory of content: Using information theory to account for misrepresentation, Mind and Language 16(3), 331 334.

Wilson, T.: 2002, Strangers to Ourselves, Harvard University Press.

Zeki, S. M.: 1978, Functional specialization in the visual cortex of the monkey., Nature 274, 423 428.

Zeki, S. M., Watson, J. D. G., Lueck, C. J.and Friston, K. J. K. C. and Frackowiak, R. S. J.: 1991, A direct demonstration of functional specialization in human visual cortex., Journal of Neuroscience 11, 641 649.

Zeki, S. and Shipp, S.: 1988, The functional logic of cortical connections., Nature 335, 311 317.


[0] This work is fully collaborative. Authors appear in random order.

[1] Although some of these theories have not explicitly been formulated in terms of ’information’, we classify them under the label ’informational theories’ because all of them try to accommodate representational relations by appealing to probability relations. At least in a certain way of understanding this notion, a certain amount of correlation is sufficient for an entity to carry information about another entity (Floridi 2010).

[2] Dretske tried to solve these problems by distinguishing a learning period (in which misrepresentation is still impossible) from a post-learning period, but it is generally agreed that this proposal probably cannot solve any of these difficulties.

[3] For a detailed discussion of whether Scientifically Guided Informational Theories can solve the first problem, see [authors].

[4] Defender of same-order theories (Kriegel 2009, [author1]) agree with this idea. It is unclear whether defenders of such transitivity principle are committed to a representation of a representational state (cf. [author1]).

[5] Rosenthal (2012) has recently defended that metacognition and the postulated higher-order representation has little in common beyond the fact that they both postulate higher-order psychological states. It should be noted that even if Rosenthal is right and consciousness does not require metacognition as it seems at least prima facie, defender of HOR theories still accept that consciousness depends on representation of our own mental states.

[6] Of course, INFO is incompatible with R representing the two states at the same time (since, ex hypohtesi, the conditions pick up a single state). The point is that the external stimulus is as good a candidate as the other mental state.

[7] cf. Bartels and Zeki (2005). According to them the binding of motion and color is a post-conscious process.

[8] Once metarepresentation enters into play, conditions 1 and 2 has to be slightly modified, for no state increases the probability of a state M more than M itself. Quantification is restricted accordingly in 1 and 2.

10 thoughts on “Information and Metacognition”

  1. Sebastian and Artiga worry that naturalistic theories of representation in the Dretskean tradition cannot distinguish between representation and meta-representation, and thus that they fail to provide an adequate account of representation at all.

    According to S&A, SGIT (for “scientifically guided informational theories”) will misassign content to vehicles such as brain states or patterns of neural activity. Sometimes it will get things wrong by assigning a proximal (e.g. another brain state) rather than distal (an object out in the world) target of the representation. Other times, it will assign a distal content when it ought to assign metarepresentational one.  To illustrate these problems, they introduce a toy theory – INFO – which seems to exhibit these defects, and then argue in the second half of the paper that the problems generalize to all other naturalistic theories of content.

    I agree with S&A that a successful theory of mental content – if there is one to be found – should be able to get the content “right” for clear-cut cases, and I even agree that purely informational theories won’t always be able to do the job without sneaking in some other resources.  But I don’t think the particular theories they target are subject to the worries, at least not as stated, and, more importantly, I don’t think the worries generalize to the broader class of naturalistic theories.

    Here are the conditions for representation in the toy theory.

    INFO:   Ri represents Si iff for all j ≠ i, the following conditions hold.

    C1 :   P(Ri | Si) > P(Ri | Sj)

    C2:    P(Si | Ri) > P(Si | Rj)

    Eliding some details, S&A’s central worry is that C1 is likely to fix on earlier members of the causal chain leading up to the representation, rather than any object or state of affairs out in the world, and thus any representation ends up being a metarepresentation (of an earlier processing state).

    S&A ask us to consider a representation of a red moving object Rrm.  They suppose that there is a causal chain leading to the tokening of that representation, with the moving ball at one end and less specific representations merely of a moving object and merely of a red object in between.  The crucial assumption here is that these intermediate lower-level representations are likely to be better predictors of Rrm being tokened than anything “out in the world”.

    Then P(Rrm|Rm) > P(Rrm|Sx) for any Sx that is more distal than Rm – because, S&A maintain, “As Rm is part of the causal chain that leads to Rrm, we can hardly assume that the presence of a red moving objects increases more the probability of Rrm than the state that represents moving things (Rm) does.

    But this in fact is exactly what we should assume, given the well-characterized phenomenon of perceptual invariance  – a convergence on a higher-level representation of a distal object on the basis of quite diverse proximal stimuli (and thus quite diverse lower-level) representations.1

    Vision science posits representations of distal objects that are invariant under changes in proximal stimuli, including a range of lighting conditions and viewing angles and distances.  (Insofar as we accept characteristic patterns of neural activity as representations, correspondingly invariant electrophysiological activity has also been recorded in higher visual areas). Those representations are thus examples of one-to-one mappings between internal states (putative representations) and distal objects, despite one-to-many mappings from distal objects to intermediate internal states, and more importantly, many-to-one mappings from intermediate internal states to higher level visual representations.

    Burge (2010), places great weight on perceptual invariance as a mark of representation (although of course Burge would not like to find himself supporting deflationist theories of representation in the Dretskean tradition). Dretske himself in Chapter 6 of KFI (1981) gestures at a “convergence” criterion to resolve a version of this problem.

    Thus, contrary to S&A’s supposition, the functional architecture of the visual system allows for proximal states to be less well correlated with the high-level representations (to which they make some causal contribution) than those high-level representations are with their distal objects, and so there is no pressing danger that high-level perceptual states will tend to fix on earlier perceptual states as their representational targets, rather than distal objects in the world.

    S&A then make the argument in the other direction. They posit a metarepresentation M, of Rm, where the content is M is supposed to be “Rm is being tokened”, or, more colloquially, “I believe I am seeing a moving thing”. They then argue M might be more tightly correlated with the external occurrence of a distal moving object than with Rm being tokened – for example, if Rm is not very reliably tokened.

    I think this argument is begging the question.  If it really were the case that M were best predicted by some external state rather than an internal one, what independent reason would we have for thinking that it should be interpreted as a metarepresentation, rather than as a representation?  I’m not denying that we might have some, but S&A do not present any; they merely stipulate. Furthermore, they conclude from this that if M turns out not to be a metarepresentation on INFO (given some set of environmental conditions that fix the probabilities in a particular way), then that implies that no metarepresentations are possible under those conditions.  But that just doesn’t follow.  Perhaps under these circumstances M is not a metarepresentation, but we have not yet ruled out the existence of another vehicle that has some facts about M as its content.  Given the absence of other criteria to assign contents, what makes this such a clear-cut case against INFO?

    Let me pause here to note that the theory “INFO” that they introduce and attribute to Usher (2001) is not in fact one that Usher proposes2; furthermore, as a kind of simplified mish-mash, INFO exhibits serious defects that Usher’s original did not3, and, more importantly, that other naturalistic theories of content do not.

    Perhaps a better way of making their case against INFO-like theories would be to say that when content is assigned on the basis of relative strength of correlations between brain states and target states.  Thus it seems that the content of a given representation (say Rrm) can be changed – or even eliminated – by manipulating what seem like irrelevant factors (how likely moving objects are to be red, whether other objects are highly correlated with moving objects). And this is unintuitive to S&A – as it should to anyone who thinks that the content of a representation should depend only on the relationship between the vehicle and its target (and the frequentist probabilities of their co-occurrence).

    But even the other informational theories briefly discussed in the paper do not share these features.  Theories such as Skyrms’ (2010) and Scarantino’s (2015) embrace the multiplicity of content, thus sidestepping worries about multiple potential targets. In their vector schemes, a signal may carry information about all the states whose probabilities are changed given the signal, with relative differences quantitatively incorporated into the informational content vector. So any concerns about a signal having a distal vs an internal target are misplaced – we can and perhaps often do have both.

    In their criticism of Skyrms’ informational theory, S&A write “If … the probabilities of other mental states are included in the vector, then … we would expect a representation of the external world to have non-zero values for some external states and a metarepresentation to have non-zero values for some neuronal states.” Just so.  But why should this be a problem?  Because, S&A contend, “we have no reason to expect a difference in probabilistic vectors that correspond to, say, MRrm and Rm. Consequently, if content is determined by conditional probabilities, we will have no way to distinguish them.”

    I think this second criticism is also misplaced.  The theory provides us with a formula to assign content to a given signal.  If that content turns out to itself be a representational state (by the same theory’s standards), then we have found ourselves a meta-representation. That the same criteria are used in both cases seems to be a virtue rather than an inadequacy – a result of the theory being general and systematic. S&A talk as if to solve the problem would require the theory to have different criteria for representations vs. meta-representations, and I just don’t see why this should be required.

    We might also ask whether these theories can do a good job in contexts where we might expect to find meta-representations (e.g. in the brain).  And here I’m more inclined to agree with the authors that such extension can be awkward. For Skyrms’ theory, it is because his goal was to give a story of signals that are distinct from the agents that are using them, rather than of internal representations.4 It’s not clear that the patterns of activity in the brain such as successive stages of visual processing are well-modeled that way.5

    It might turn out that we should be pluralists even about naturalistic theories of representation, with different accounts for different kinds of representation.

    So what, if anything, do informational theories have to worry about? A more general worry might be that some naturalistic theories of content simply help themselves to an intuitive class of potential representational targets. One such class might be distal objects individuated in a way consistent with our commonsense ontology. Another class might be events that seem ethologically significant, such as the appearance of a predator or a resource (as in Scarantino).  Yet another might be a set of laboratory stimuli of the kind that neuroscientists employ to interrogate neural responses in different tasks.  Then we might ask, why should we think that the system represents members of that stipulated class, whichever it might be, rather than other targets with similar – perhaps stronger – informational relationships to the putative representational vehicles? As S&A ask, which states make it into the vector?

    This sounds like a version of Millikan’s “reference class” problem, with meta-representations as the foil.

    It is here that I think functional theories can swoop to the rescue. We posit internal representations in order to explain how representers manage to coordinate their actions to their situation. Unlike purely informational theories, functional theories explicitly incorporate that fact as a further constraint on how to pin down the targets of representational states. See Millikan 1990, Neander 2006, and especially Harms 2010 for detailed responses to this concern. Shea (2012) examines the exact case of metarepresentations in some detail, proposing that the way to decide between representational vs. metarepresentational targets in cases where it seems that either intepretation is available is to evaluate some counterfactuals to help us to determine which correlation actually contributes to the system performing its task.

    This brings us to the last section of the paper, where S&A make the interesting contention that it is not functional theories themselves that are problematic, but rather their uneasy marriage to informational theories: “Once functions are brought in, the notion of information might be shown to play no important role in the resulting naturalistic theory of content.”  Why?  Because in order for a functional theory to have the resources to assign quite specific contents, it must employ a far more specific notion of function – and a sufficiently specific notion of function might render information superfluous in assigning content.

    Besides being very interested in hearing more about what specific notions of function would be required, I want to give two brief considerations for why I think more promising versions of functional theories will prevail (at least against this worry).

    First, as Shea (2008) proposes, perhaps the relevant functions are those that have the function to carry information. And to the extent that it is external states of affairs that we need to coordinate with (act on and respond to) in order to do well for the purposes of biological function, it is those that will end up as the targets of our representations.  In other situations, when what we need to deal with are aspects of our own states (perhaps so that we may revise them or otherwise modulate their influence), then the functions in question will be targeted on other internal states, and internal states will be the targets of representations.

    Second, if a notion of function specific enough to solve the problem might make information irrelevant, then so much the better. I would suggest that rather than illustrating a tension, what they have noticed is that the two notions play overlapping explanatory roles.  That doesn’t seem terrible to me.  If representation can be naturalistically explicated in terms of a notion of function that is itself entirely naturalistic, that doesn’t seem to be any worse for the project of explaining content than appealing to an informational notion instead.

    Nor would it prevent scientists (or the rest of us) from using information as a heuristic for functions (since biological functions, at least, might be less epistemically accessible than synchronic correlations between observable variables).  As S&A say, “carrying information would not constitute a necessary or sufficient condition for a state to represent another state.”  Given the ubiquity of information, I would argue that that is a conclusion that should elicit relief rather than dismay in proponents of naturalistic theories of content.



    Burge, T. (2010). Origins of Objectivity. Oxford University Press.

    Dretske, F. (1981). Knowledge and the Flow of Information, Cambridge: MIT Press.

    Usher, M. (2001).  A statistical referential theory of content: Using information theory to account for

    misrepresentation, Mind and Language 16(3), 331-334.

    Godfrey-Smith, P. (1991). Signal, Decision, Action, Journal of Philosophy 88, pp. 709–722.

    Skyrms, B. (2010). Signals: Evolution, Learning, and Information. Oxford University Press.

    Scarantino, A. (2015). Information as a probabilistic difference maker, Australasian Journal of Philosophy

    Shea, N. (2014). Reward Prediction Error Signals Are Meta-Representational. Noûs 48 (2):314-341.

    Shea, N. (2007). Consumers Need Information: Supplementing Teleosemantics with an Input Condition,

    Philosophy and Phenomenological Research 75, no. 2 (9, 2007): 404-435.

    Millikan, R. G. (1990). Truth, rules, hoverflies, and the Kripke-Wittgenstein paradox, Philosophical Review 99 (3):323-53 (1990)

    Neander, K. (2006). Content for Cognitive Science. In G. Macdonald and D. Papineau (eds.) New Philosophical Essays. Oxford University Press.

    Harms, W. F. (2010). Determining truth conditions in signaling games. Philosophical Studies, 147(1), 23-35.



    1 This is of course assuming that the perceptual experience of invariance is the consequence of a representation at some level of visual processing that is itself invariant to some systematic changes in proximal stimulation. I am leaving aside worries about whether any of these low-level subpersonal states should legitimately be termed representations, or whether we may assume determinate content for any of them.

    2 Usher offers up two distinct ways of fixing two distinct types of content, one “external scheme” (approximately characterized by C1 above) and one “internal scheme” (C2 above). These are supposed to be independently applicable, and they are accounts of two kinds of content. Each alone is considered a necessary and sufficient condition for its respective kind of content to be assigned. S&A combine these into a single theory where both conditions have to be met in order to specify the representational content of (say) a brain state. While I’m sure their intention was merely to produce a simple toy theory, this novel version of the theory yields very different results from the original. Furthermore, the original theory was meant to be applied in a very specific context, where, given a previously stipulated well-defined set of distal targets, either “Which of those pre-specified objects is a particular brain state best predicted by?” (external), or “Given a pre-specified object, which of several competing brain states best predicts its presence?” (internal). In both cases, metarepresentation has been simply left off the table.

    The external-based scheme did not deal with meta-representations because those were simply not in the running as potential representational targets. It is only the internal scheme, which attempts to characterize representations “from the point of view of the animal” that is really susceptible to the charge that it may sometimes identify internal states as the representational targets rather than external objects – but again, that does not seem to be a devastating outcome for the naturalistic project, or even the more narrow information-based project of giving an account of content. Predictive coding theories, which seem to be all the rage these days, say just that, and proclaim it as a virtue.

    3 To take just one example, INFO appears unable to account for instances of representation that have high rates of false positives, but that are maintained because of the high stakes involved (cf Godfrey-Smith 1991). Usher is at pains to note that his own internal scheme, at least, can handle the problem.

    4 As Skyrms points out, the agents in his models can be very minimal indeed – they need only respond differentially to different signals, and do better or worse thereby. There is no requirement that they have cognitive capacities that include internal representations.

    5 Skyrms also clearly believes that his theory can be applied to the inner workings of the brain. The question is how – and whether, when we apply that theory, the contents we get out might be quite different from the ones postulated by vision science, neuroscience, or folk psychology.

  2. I share Sebastián and Artiga’s skepticism about the feasibility of Scientifically Guided Information Theories of representational content (SGITs), but I think their specific objection misses the mark. That said, a modified version of their objection does indeed skewer the theory, although it turns out that the objection is not restricted in its scope to meta-representation, but affects SGITs’ account of representational content quite broadly.

    Sebastián and Artiga’s representative target for all SGITs is Usher’s (INFO), which in non-technical terms can be understood as follows:

    R represents S iff:

    1) R is more informative about S than any other entity, and

    2) S is more informative about R than any other representation (in the organism’s repertoire).

    which is equivalent to:

    1’) The probability of R given S is greater than the probability of R given any other entity, and

    2’) The probability of S given R is greater than the probability of S given any other representation.

    In criticizing INFO, their main example involves a hierarchy of cortical representations, where a representation of redness (Rr) together with a representation of movement (Rm) combine to produce an object representation that binds the characteristics of redness and movement (Rrm):

    With respect to this example, Sebastián and Artiga suggest that Rm has just as much claim to be the representational content of Rrm as do red, moving objects. This is because Rrm is highly informative about the presence of Rm; in fact, we can very reliably predict that Rm will be present if we know that Rrm is present, surely as reliably (if not more reliably) than the presence of a red, moving object. This would then entail, they claim, that according to INFO, Rrm represents an earlier sensory representation in the cortical hierarchy just as much as it represents what it really represents, namely red, moving objects. INFO cannot tell us what it really represents, since it cannot deliver the correct result that it is not a metarepresentation.

    But there is a problem with their suggestion understood as a critique of INFO. As they note, for INFO to dictate that Rrm represents red, moving objects as it should, the following inequality must be satisfied, according to INFO (see 1’ above):

    P(Rrm | red moving object) > P(Rrm | Rm)
    But in general, this inequality will indeed be satisfied as long as there are a decent number of non- red moving objects in the organism’s environment. i.e. as long as there are a sufficient number of Rm tokens that don’t go on to cause Rrm. (Perhaps they go on to cause Rgm, a representation of a green, moving object; or Rbm or….) So it seems that, in the normal case at least, INFO will escape the Sebastián and Artiga objection. Rrm would indeed be more informative about red, moving objects than it would be about earlier sensory representations in the cortical hierarchy. Or, in other terms that they and some other SGITs use, the correlation between red, moving objects and Rrm would indeed be better than that between Rm and Rrm.

    What about the abnormal case, where virtually all moving objects are red, often enough to ensure that the above inequality fails? In this case, there are very few Rm that fail to go on to cause Rrm. In that kind of strange situation, an unusual world in which moving objects are red with extremely high reliability, I think we must wonder whether there really is a distinction in content between Rm and Rrm. At least, the advocate of INFO as a theory of representation could maintain that this was simply a case of redundant representation, not a hierarchy of representations to be distinguished by their contents. “What is there to distinguish them?”, they could reasonably ask.

    Let’s step back and look at the general form of the objection. As I understand it, the objection needs a regular causal sequence like this:

    object —> intermediate representation —> final representation

    where the final representation is not in fact a metarepresentation, but where INFO implies, incorrectly, that it is. Including both the conditions on INFO, that means the intermediate representation needs to be at least as reliable a cause of the final representation as the object is, plus the final representation must be at least as reliably retrodictive of the intermediate representation as it is of the object. (In addition the final representation has to be the best option in this respect among all the representations on offer in the organism.)

    One might think that a simpler example would work (suggested to me in correspondence by Artiga):

    red object —> Rr —> RR

    where Rr is a sensory representation of redness (not bound to an object), and RR is a higher-level representation of a red object. In this case, the intermediate representation is much more closely tied to the final representation than in the original example. In general, whenever you have a sensory representation of redness, you will also get a representation of a red object, and vice versa. In addition, there seem to be no alternative representation competing in their predictive reliability in these respects.

    But the devil is in the details. What is a “red object” representation, as opposed to a sensory representation of redness? Well, there seem to be representations in the visual system that track visual objects independently of their sensory characteristics; they are “indexes” or “object files” that, in some sense, point to a visual object and keep track of it (Pylyshyn 1989; Erlikhman et al. 2013). (These are low level “sensory” representations not involving anything like the concept of
    an object.) The higher level representation of a red object would then be the product of at least two sensory level representations, one of sensory redness, and the other one of these visual object trackers (Ro):

    This makes the new, simpler example exactly parallel to the original example involving two sensory-level representations combining to produce a representation of a red, moving object, and it will face the same problems. (There are perceptions of red without perceptions of red objects; when closing one’s eyes and looking at a bright light, for example.) Of course, this parallel relies on potentially idiosyncratic features of human (or probably mammalian) perceptual systems, and there may be other types of creatures whose perceptual systems do not operate in this fashion.
    Basing the counterexample on a purely hypothetical creatures does not look promising, however, since our intuitions about what the “correct” content in a hypothetical situation cannot be accepted with much confidence.

    That said, I believe there is a successful objection in the neighbourhood. Returning to the original example, there is a way to generate the inequalities that cause problems for INFO. While Rrm is not equally or better correlated with Rm than with red, moving objects, as Sebastián and Artiga originally suggested, it is equally or in fact better correlated with something else: namely the combination or conjunction of Rr and Rm.

    Remember that what we need for a successful counterexample along the lines we have been exploring are two things. First, the intermediate representation must be at least as reliable a cause of the final representation as the object is. Clearly the conjunction of Rr and Rm will be highly reliable causes of Rrm; this will be by far the most common way that a tokening of Rrm is occasioned (assuming it is some kind of perceptual representation, and perhaps even if it is not). In fact, red, moving objects will cause Rrm less reliably than the combination of Rr and Rm simply because they are earlier in the causal chain. (As Sebastián and Artiga note, states tend to carry more information about their proximal causes than their distal causes, simply because less can go amiss.) In other words, (1’) is satisfied if we suppose that Rm represents the combination of Rr and Rm: The probability of Rrm given Rr and Rm is greater than the probability of Rrm given any other entity… including even red, moving objects.

    The second thing we need for a successful counterexample is for the final representation to be at least as reliably retrodictive of the intermediate representation as it is of the object (and that there are no other representations that are better retrodictive of the intermediate representation). Again, because we are dealing with a causal chain here, Rrm will indeed carry better information about the combination of Rr and Rm than it does about red, moving objects. (And I can’t think of any other representation that carries better information about that combination of intermediate representations. Perhaps there is some mediating representation in the causal chain leading from Rr and Rm to Rrm, but if so, that representation can play the relevant role in the counterexample.) Therefore (2’) is satisfied as well: The probability of Rr and Rm given Rrm is greater than the probability of Rr and Rm given any other representation.

    As a result, it seems the INFO entails that Rrm doesn’t represent red, moving objects; instead it represents the combination of intermediate representations, Rr and Rm. Exactly as Sebastián and Artiga’s example was supposed to show (but didn’t), INFO incorrectly dictates that Rrm is a metarepresentation.

    What might a defender of INFO say in response to this modified metarepresentation objection? One line of response is to suggest that the proposed metarepresentational content is somehow illegitimate. Sebastián and Artiga dispose rather neatly of Eliasmith’s move along these lines (i.e. his move to restrict representational contents to those that do not fall under the computational description of the cognitive system). The modified objection opens up a new line of response, however, which is to complain that the proposed metarepresentational content is gerrymandered in some way.

    It is typical for information-based theories to restrict possible contents; otherwise it is too easy to come up with contents that “do better” on information-based measures than what our intentional states actually represent. For example, a visual representation of squareness will be more reliable in the information it carries about squares-in-good-light than about squares in general, so the content of “square-in-good-light” must be ruled out by fiat. Fodor tries to do this with “nomic properties,” and advocates of SGITs adopt a similar tack. For example, Usher (2001) excludes “events such as ‘a glimpse of a dog in the dark’. Such events are situation dependent and do not satisfy the requirement of being an object (or substance)” (p. 318); “what is represented are objects only and not objects-under-a-situation” (p. 325). As Sebastián and Artiga note, Rupert restricts contents to natural kinds. Any of these restrictions could be wielded against the modified counterexample: the “combination” or “conjunction” of Rr and Rm could be dismissed as an unnatural category.

    One problem with these kinds of restrictions is that many of our mental representations represent unnatural categories, from an objective perspective. (Think of Akins’ (1996) narcisstic sensory representations, or the concept of furniture.) Another problem is that restricting the class of candidate contents to those that we in fact find intuitive ways to carve up the world is fundamentally arbitrary. Because we find them intuitive, this can easily go unnoticed. But this is to ignore the fact that we need an explanation for why those categories are the intuitive ones. “Because they are the most predictive,” you might say. Possibly; that is why we ought to represent them. But what is it, within an information-based theory, that predicts we do represent them?

    Nothing – in fact, these theories predict that we represent squares-in-good-light. Thus some sort of restriction on the range of candidate representata must be tacked on, ad hoc, in order to save the theory. (By contrast, a teleological theory connects what we do in fact represent to what we ought to represent, what we are biologically supposed to represent.)

    Therefore it seems to me that the interesting metarepresentation objection that Sebastián and Artiga describe is perhaps best seen as part of a deeper problem. This is the problem that purely information-based principles, when applied without an illegitimate special treatment of intuitive categories, always allow for a better content candidate than what is actually represented. Those better candidates might be intermediate representations (as in their examples), proximal stimuli, combinations or disjunctions, objects in epistemically ideal situations, or what have you.

    Therefore I heartily endorse the authors’ conclusion that information is unlikely to provide a theory
    of content on its own. Three cheers for teleosemantics!



    Akins, K. (1996). Of sensory systems and the “aboutness” of mental states. Journal of Philosophy, 93(7), 337-372.

    Erlikhman, G., Keane, B. P., Mettler, E., Horowitz, T. S., & Kellman, P. J. (2013). Automatic feature-based grouping during multiple object tracking. J Exp Psychol Hum Percept Perform, 39(6), 1625-1637.

    Pylyshyn, Z. (1989). The role of location indexes in spatial perception: a sketch of the FINST spatial-index model. Cognition, 32(1), 65-97.

    Usher, M. (2001). A statistical referential theory of content: using information theory to account for misrepresentation. Mind & Language, 16(3), 311-334.

  3. First of all, we would like to thank John Schwenkler and Brett Castellanos for making this extraordinary event possible. We would also like to warmly thank Rosa Cao and Dan Ryder for their time and their very interesting comments. They are thought-provoking and extremely useful, and we are sure that they will help to improve the paper very much. Indeed, they raise so many questions that we will not be able to address all of their insights (given the current space constraints). Nonetheless, we will do our best to answer their main worries. We’d love to further address these and any other questions in the discussion.

    Before addressing the reviewers’ main points, we would like to use some of the issues raised by Cao to clarify a bit better the goal of the paper, which might not have been sufficiently clear. The main purpose of the essay is not to argue against naturalistic theories of content in general, nor are we attempting to suggest a problem for all naturalistic theories that merely use the notion of information. Our central goal is to raise some objections against a purely informational theory, that is, to a naturalistic theory that exclusively appeals to informational relations in order to account for representational content. Similarly, we are not primarily interested in what might be called the metasemantic question, that is, the question of what representations are (it is very unlikely that a purely informational theory can satisfactorily answer that question – and it is also unlikely that they are intended to do so). What we address is what we call a ‘semantic’ question: given that R is a representational state, what determines whether its content is S1 rather that S2 or S3? Finally, although our arguments turn around misclassifications of representation of stimuli and metarepresentation, we are definitely not suggesting that we should have two different naturalistic theories, one for each of these categories. We agree with Cao that to have a single semantic theory for representations and metarepresentations is clearly preferable (and nothing of what we argue suggests that this goal is unattainable), for the relation that holds in both cases seem to be of the same kind.

    Having settled these preliminary issues, let us move to some of the questions raised by the authors. We will consider five of them: a possible misinterpretation of Usher’s proposal, worries about our counterxemples, the generalization to other informational theories like Skyrms’, our suggestion that bringing in functional notions might render the whole notion of information useless and the question of whether our arguments should be reduced to the ‘reference class‘ problem.


    1) First of all, Cao argues that INFO does not capture Usher’s view. According to INFO a state R represents S iff conditions 1 and 2 are satisfied. However, Cao suggests that ‘Usher offers up two distinct ways of fixing two distinct types of content, one “external scheme” (approximately characterized by [condition 1 of INFO]) and one “internal scheme”([condition 2 of INFO]).’ Cao might be right about that (although we still think that INFO probably captures other proposals, such as Eliasmith’s). Nonetheless, note that our arguments also work against Cao’s interpretation of Usher. For instance, we argue that in an intuitive case in which Rrm represents red moving objects, the relation between Rrm and Rr satisfies conditions 1 and 2. Since satisfying 1and 2 suffices for satisfying 1 (and for satisfying 2) the same consequences will follow from this reading.

    2) Concerning our counterexamples, Ryder points out that the way of framing our first argument against INFO probably fails. In our example, a single representational state Rrm is activated by two intermediate representations Rm and Rr, which strongly correlate with moving and and red things, respectively. We argued that in this case, one should expect P(Rrm | Rm) >  (Rrm | red moving object), but he rightly points out that if there are enough moving things that are not red  this inequality will not hold: our mistake. Nonetheless, he suggests a slight amendement that could solve the problem: our point could be made by taking Rm to better correlate with the conjunction of Rm and Rr. In other words, while the previous inequality fails to hold, it is still true that P(Rrm | Rm & Rr) >  (Rrm | red moving object). We think he is completely right and kindly accept this suggestion.

    Ryder also discussess an alternative reply to his own objection. In the paper, we preferred a complex stimuli (red moving object) due to the large amount of evidence in favor of intermediate representational states–each feature is precessed independently. But there is no need for that, as Artiga noted in the exchange previous to the conference that Ryder mentions. He considers an alternative configuration, in which there is a sequential structure involving a red object, a single representation Rm and a final representation RR. Apparently, that seems to be a clear case where our objection applies: there are surely cases in which RR plausibly represents a red object, and nonetheless INFO entails that RR is a indeed a metarepresentation of Rr, because it correlates much better with this intermediate state. Interestingly, Ryder complains that there is something fishy in the example. If RR represents red object, then it cannot have a single input Rr (which represents redness); it should have a second input that provides RR with the capacity to represent object. Thus, it is not possible to have a sequential structure red object-Rr-RR in which the intermediate state Rr represents redness and the last element represents red object. This is certainly a compelling reasoning, but we doubt one of its premises. In general, it is not true that in order to represent an object instantiating a property we need to have a low-level representation of each of these elements. An example illustrating this fact is provided by the literature on teleosemantics; the frog’s mental state might represent something like edible bug (or fly, or frog food or what have you) even if it lacks low-level representations for ediblity, bugness or food. Similarly, while it is plausible that neural activation in the fusiform face area represents faces, it is usually not assumed  that there must be some low-level stage at which faces are somehow represented. A cognitive structure can use representations of proximal features in order to generate a representation of a more distal affairs. Thus, the idea that the intermediate state represents redness while the next sequential state represents red objects seems to be entirely plausible. As a result, we think this is a valid way of providing a counterexemple to INFO.

    Cao presents what seems to be an unrelated problem.  She argues that we paid unsufficient attention to the phenomenon of perceptual invariance, in which a single representational state Rm can be generated by a set of different stimuli (S1, S2, S3). For instance, in the well-known case of color perceptual constancy, the same surface reflectance can produce different stimuli depending on its illumination, but (crucially) despite this variablity in proximal stimuli, we always pereceive it as being the same color. Certainly, we did not consider this sort of example; our goal was merely to provide a set of counterexemples to INFO, rather than discussing all the different mechanisms and assess whether INFO gives the right result. Nonetheless, we do think that INFO also has problems with perceptual invariance. After all, from a purely informational point of view, nothing in the theory can tell whether Rm represents the distal stimulus (say, such and such surface reflectance in our example) or the disjunction of proximal stimuli (S1orS2orS3 – and hence the connection with Ryder’s  point). The correlation with the distal feature will be at least as good as the correlation with the most proximal stimuli. Indeed, given that the set of proximal stimuli are detected by a set of intermediate representations (R1, R2, R3), one should expect Rm to correlate better with R1orR2orR3 than with the set of stimuli– and, of course, much better than with the distal stimulus. Thus, we think that perceptual invariances also raise a difficult problem for INFO: Rm seems to better correlate with the disjunction of intermediate representations than with any distal (or proximal) stimuli.

    3) Our objections are directed not only against INFO, but also against any purely informational theory. However, Cao argues that our arguments do not generalize to Skyrms’ approach for two main reasons. First of all, she wonders whether Skyrms’ theory could even in principle be applied to cognitive systems, such as the ones we are interested in; after all, he accepts multiple contents and he works within a sender-receiver framework that does not fit easily with cognitive systems. We agree that these are difficult problems for the use of Skyrms’ approach in the context of cognitive systems (and one can find interesting arguments in Cao, 2012), but if they succeed, the worse for informational theories. At this point, we were merely trying to be as charitative as possible by supposing that there could be some ways of using Skyrms’ approach to naturalize cognitive representations. Our point is that, if one attempted to use Skyrms’ theory in order to provide a purely informational theory of content, it will also probably fail for similar reasons.

    Nonetheless, Cao also tries to defend Skyrms’ approach from our objection: ‘The theory provides us with a formula to assign content to a given signal.  If that content turns out to itself be a representational state (by the same theory’s standards), then we have found ourselves a meta-representation.’. We strongly disagree with this contention. One of the key points of our argument is that the informational theorist cannot just bite the bullet. The examples were designed to show that an intuitive case of representations of stimuli turn out to be a metarepresentation according to INFO (and vice versa). Given the central role that metarepresentations play in some cognitive theories, the failure to adequately draw the relevant distinctions at the right places seems to be a serious drawback of informational accounts, including Skyrms´.

    4) Cao also raises some interesting questions concerning a suggestion we make at the end of the paper. We claim that, if one wants to use the notion of function in order to solve the problems raised in the paper, it might happen that the notion of information becomes irrelevant. Notice that this is very different from saying the ‘two notions [would] play overlapping explanatory roles’. The idea is that to provide a satisfactory naturalistic theory of content, the notion of function might suffice, while the notion of information might not. Thus, only the notion of information would be superflous. Shea (2007), of course, would disagree, since he argues that teleological theories need to be supplemented with an informational condition (but see Artiga, 2013 for a response). In any case, we admit that this is a difficult question, and a full discussion of this point would require a paper on its own (and, indeed, one of the authors is less convinced than the other concerning its prospects). Nonetheless, given the arguments of the paper, this is an option that should be seriously taken into account. If this suggestion is on the right track perhaps informational notions could still play some heuristic role, as Cao suggests, but at least this role could not simply be granted without further argument.

    5) Finally, both Ryder  and Cao seem to suggest that our arguments might be considered (or should actually be presented as) a particular version of the ‘reference class’ problem. A satisfactory naturalistic theory of content should not only provide a plausible relation between representations and representata, but it should also specify the set of entities that can be represented and (more importantly) justify why they are the relevant candidates. We agree that purely informational theories generally fail to address this question, and it is unclear whether they possess the right tools for solving this problem. Nonetheless, in the same way that we assumed that they have some independent way of specifying what representations are (i.e. we took them to be semantic rather than metasemantic theories), we also did not want to get into the difficult question of how to specify the relevant set of representata. We wanted to highlight that, even granting all that, purely informational theories have a problems selecting the right content. We wanted to focus on the key questions they are better at, and show that even there, they fail. Although we agree that there are many problems in the vicinity that still put more pressure onto these approaches (as well as onto many other naturalistic approaches), we wanted to stress that the problem we are presenting is more pressing and independent of ‘classical’ dicussions on reference fixing, like those on adequacy or on indeterminacy: cognitive theories make use of a sufficiently clear distinction between representations of external stimuli and representations of representations and a satisfactory theory of mental content must allow this distinction.

    To conclude, we just want to thank again Ryder and Cao for their efforts. We are sorry if we have failed to satisfactorily address in all detail their points, or if have misunderstood some of their comments. In any case, we are happy to develop any of these issues (and open any new debate!) in the discussion.

  4. Hi Miguel and Marc,

    Thanks for an opportunity to think anew about information-based theories of content!

    I find your argument (taking into account your clarifications in your response to the commentaries) to be quite convincing.

    Inspired by Dan’s appeal to object files, let me try to sketch a strategy for showing that the problem you pose to INFO (i.e., a first order representation erroneously classified as a meta-representation by INFO) does not apply to object files. I am aware that you didn’t claim that the problem applies to object files. Still, I think this might be relevant to your argument, because if object files are immune to the worries you present, then perhaps it is possible to appeal to object files in order block your argument in other cases (e.g., that of Rrm being wrongly classified as a meta-representation).

    On the standard understanding of object files, we distinguish what an object file is of (namely an object), and what an object files contains, namely information about features. I will mainly focus on determining the first sort of representation (namely of objects, not of features ).

    Suppose there is an object file containing the information ‘red’ and ‘square’ (FILE[r,s]), built on the basis of representations of red (Rr) and square (Rs). FILE1[r,s] predicts that there is a red and square object in the world better than other representations do. But, crucially, FILE1[r,s] does not predict the existence of a lower-level representation of an object better than other representations do, for the simple reason that (let us assume for the sake of discussion) there is no such lower-level representation. That is, there is no representation of objects beneath the level of object files in the representational hierarchy. Beneath the level of object files there are only representations of features, unbound to any object. Thus, the INFO theory correctly implies that FILE1[r,s] is of an external object, and is not of an internal representation. In this sense the “meta-representation” problem does not apply to the issue of what object files are of. Does this sound right?

    To clarify, the point is that I distinguish between an object file representing an object (what the file tracks) and the information contained in the file (about features). There is no internal representation that can be a candidate for the first sort of representation (or so I have claimed). So trivially, FILE[r,s] does not predict this internal representation at all (let alone better than other representations do), hence according to INFO, FILE[r,s] does not represent (the first kind of representation) an internal representation, hence it is not a meta-representation.

    What about the information about features contained in object files? This part is less clear for me. But here is a first pass: it seems unproblematic in the present context to hold that if an object file is of an object X, then the features stored in the file are attributed to X. In other words, the information relations governing the first sort of representation (which object is the file of?) determine whether the representations contained in the file are of features of an external object or of an internal representation. Given that object files are of external objects (see previous paragraphs), it follows that the representations contained in object files are of features of external objects, hence they are not meta-representations. The information relations determining the second sort of representation only determine which external feature is represented, but not whether an external feature (rather than an internal representation) is represented.

    If this is on the right track then both sorts of representations associated with object files are immune to your “meta-representation” worries regarding INFO.

    Does this make sense?

  5. Dear Assaf,

    Thank you very much for your comment! It is a very interesting question. When thinking about these issues, we did not consider the literature on object files, but it might certainly be a good place to look at. Thanks!

    Before considering your point, let me stress that the strategy of our paper is to provide some counterexamples. Thus, we are happy to accept that in certain cases INFO gives the right results. Nonetheless, let me try to argue why the example you presented might not be one of these cases.

    Here is what I would say: In your description of the example, you assume that FILE[r,s] contains information about redness and squareness and that it represents (it is of) objects. Accordingly, let us assume that this should be the intuitive result (or, even better, let us suppose that this is the way our best cognitive theory would describe this mental state). Now, the key question is whether INFO can deliver this result. In other words, the issue is whether INFO entails that FILE[r,s] represents objects.

    As you suggest (and as we replied to Dan), we agree that FILE[r,s] correlates better with red and square objects than with either an intermediate representation of redness or an intermediate representation of squareness. However, take the conjunction of both representations (i.e. low-level representation of red and low-level representation of square). It seems that FILE[r,s] correlates better with the conjunction of the two low-level representations than with the distal affair. Consequently, INFO entails that FILE[r,s] represents the two intermediate representations, which contradicts our original assumption.

    At some point, you claim that the intermediate representations are not good candidates because they do not represent objects (and FILE[r,s] does). However (and this is the key point) when arguing that INFO entails that FILE[r,s] represents distal objects, you cannot assume that the only good candidates are representations of objects, because you are not entitled to assume that FILE[r,s] represents objects. This is what the theory should deliver, not something that INFO can take for granted. Thus, on pain of circularity, you cannot assume that the intutive content of FILE[r,s] is object, and as a result I do not see why the conjunction of intermediate representations are not good candidates.

    Finally, if INFO does not enail that FILE[r,s] represents a distal object, then your interesting suggestion concerning the content of the information contained in the file would not help the theory.

    Would you agree?

  6. Hi Assaf,

    Thank you very much for your time and for you interesting question.
    I think I agree with Marc’s comment and I would love to read what you think about it.
    Some comments just to complement Marc’s remarks.

    Here is, I think, another way to present Marc’s worry (or something along its line): how can we, in representational terms, distinguish representations of objects and representations of features if a feature is always instantiated by an object?

    My guess is that object files are useful computational metaphors, but I am not sure about how to play along with them once one get into the details of a naturalistic theory of mental content.

    In the particular case we are dealing with, our assumption is that the same relation that holds between a representation of a red object and a red object holds between a metarepresentation and the representation of a red object, which the former targets. If we use object files, I think that a metarepresentation would be an object file of a representation, but what kind of features would enter into such a file?

    Maybe one could try to reject our assumption (that the relation is the same) making use of object files. What do you think?

  7. Hi Miguel and Marc,

    Your responses sound reasonable to me. What you are basically saying is that INFO will classify object files wrongly: it will not classify them as representing objects at all, hence not as object files as they are usually conceived. Instead, it will classify them as representing a conjunction of representations of features.

    Let me try to develop what you are saying in some more detail, in order to see more clearly whether this is right.

    On the standard story about object files (Kahneman, Treisman and Gibbs 1992, henceforth KTG), they track objects, storing information about them across time. Moreover, the tracking relies on information about location (spatio-temporal continuity), but information about location does not enter the files. So object files function partly like memories, they store information about features their respective target objects had in the past, excluding location. Moreover, when a subject sees an object, the object files system retrieves stored information from the file that tracks this particular object, and if the stored information matches the presently seen feature, recognition of the seen feature occurs faster (an Object Specific Preview Benefit, OSPB).

    So object files are supposed to involve two kinds of representations: the first is of objects, the second is of features. And KTG posit these two kinds precisely in order to make sense of OSPBs. We need memory of features that explains priming (or preview) effects, and we need a representation of object, or a mechanism for tracking objects, in order to explain why and how the preview effect is object specific (rather than an ordinary, display-wide priming effect). This is the motivation for distinguishing between representations of objects and of features.

    According to your responses (at least on one interpretation of it), INFO entails that object files represent a conjunction of representations (not yet sure which ones). So the first kind of representations is a representation of a conjunction of representations (of something yet to be figured out), and the second kind (presumably) is a representation of representations of features (e.g., a representation of a representation of redness).

    Let us focus on the question, “which representations are represented by the first kind of representation?” The first kind of representation is connected (in the usual story) with keeping track of objects via spatio-temporal information. So the natural candidates for being the targets of the first kind of representation, given INFO, and in light of your argument, are representations of locations or representations of locations at times”, or more probably a *conjunction of representations of locations at times. However, we know that object files do not store information about locations. So perhaps a better idea is that the first kind of representation is abstract, i.e., it represents that there were (in the brain) some conjunction of representations of locations at time, with spatio-temporal continuity between their contents. In other words, the existence of an object file correlates with the past existence of a conjunction of representations of locations with spatio-temporal continuity, but it (the object file) does not predict any specific representations of locations.

    So on this story, an object files is associated with two kinds of representations. The first existentially (abstractly) represents a conjunction of representations of locations, hence is a meta-representation, and the second represents representations of ordinary features (color, shape), hence is a meta-representation as well.

    On the standard (KTG) view about object files, the targets of the second kind of representations (what the file stores) are attributed to the target of the first kind of representation (what the file is of). It seems that on the INFO story (as I have developed it above), this is not the case. The targets of the second kind of representations are not in any sense attributed to the target of the first kind of representation. For, it makes no sense to attribute the feature “being a representation of redness” to a conjunction of representations of locations. Clearly, a conjunction of representation of locations does not represent redness. So the upshot is that the resulting object file cannot be thought to store information about what the file is of. Instead, the object file simply holds separate sorts of information, unrelated to each other, the first of a conjunction of representations of locations, the second of representations of features.

    How does this sound?

    If this is right, then it appears that INFO entails a very odd picture about what object files represent. And it furthermore appears to undermine the usual understanding that object files involve a certain sort of attribution or predication, as in the sentence “o is red” (the ‘o’ comes from the first kind of representation and the ‘red’ comes from the second). Instead, it appears that object files only represent representations of features, as in “there is such and representation of location” and “there is a representation of redness”.

    So not only object files are meta-representations, given INFO, there are actually not object files at all, if we assume that object files are supposed to involve something like attribution or predication. It therefore appears to me, at the moment at least (I feel this is complicated!), that you are correct. Object files are not immune to your argument against INFO. If anything, your argument is even more devastating in the case of object files!

  8. Hi Assaf,

    Thanks again for your comment! Your reasoning sounds right to me. Indeed, I think it might actually be a good idea to discuss the example of object files in the paper. Thanks!

    Nonetheless, let me make a brief comment on the following passage:

    “So the natural candidates for being the targets of the first kind of representation, given INFO, and in light of your argument, are representations of locations or representations of locations at times”, or more probably a *conjunction of representations of locations at times. However, we know that object files do not store information about locations. So perhaps a better idea is that the first kind of representation is abstract”.

    Your argument presupposes that we know that object files do not store information about locations and that this knowledge can be used in oder to determine that the content of FILE[r,s] are not representations of locations. However, (again) this is something that the supporter of INFO cannot use. According to this purely informational theory, FILE[r,s] represents the entity that satisfies the conditions specified in INFO. Thus, if FILE[r,s] correlates best with a set of intermediate representation of location, this is what it is supposed to mean according to INFO.

    In any case, this is a minor point and I think you are completely right that our arguments entail that in these circumstances according to INFO object files cannot represent distal objects.


    1. Hi Marc,

      Thanks for this interesting response. I have a hunch that it somewhat complicates matters.

      I agree that in principle we can’t simply assume (on the basis of vision science) that object files don’t store information about (specific) locations. Instead, we should look for correlations, and demand that INFO will deliver the required result. Still, if vision scientists are correct to hold that information about locations (the past locations of an object) doesn’t enter object files, then it will be surprising to learn that object files correlate with specific locations (or with representations of specific locations). In other words, if object files were to correlate with a conjunction of (representations of) specific locations (describing the past spatio-temporal trajectory of an object), then I think this would put pressure on the vision-scientific claim that object files do not store information about location.

      Put differently, on the vision-scientific picture, object files abstract away from locations. This means that objects with different spatio-temporal trajectories but with the same features (color, shape) would give rise to the same object files. This implies that object files do not predict specific spatio-temporal trajectories, hence according to INFO, object files don’t store information about specific locations (or about representations of specific locations).

      Does this seem right?

      The point, I guess, is that in this specific case (of whether or not object files store information about {representations of} locations), vision-science should conform to what INFO implies, and not vice versa.

      A the moment I think this requires more thought, for it is bit odd that on some occasions, INFO should conform to what vision science says (e.g., that Rrm, in your original argument, is not a meta-representation), and yet in the present case, vision science should conform to what INFO says.

      In any case, thanks Marc and Miguel for the excellent paper and for the detailed responses to my comments.


  9. Hi Assaf,

    Thanks a lot, your reflection is very interesting and useful.

    I would like to add some remarks on the relation between a naturalistic theories of mental content (in general and not just INFO) and scientific practices.

    It would be nice if the mental state attribution that a naturalistic theory of mental content makes vindicates, at least to some extent, scientific practices. Otherwise it would be unclear (to say the least) that they are talking about the same kind of entities.

    Informational theories seem to capture at least some of those practices. This is, I think, an advantage of the theory. However, we try to argue that they suffer from a defeating problem. Moreover, as you observe, if I follow properly, INFO has particular problems with object files.

    Leaving aside the objection we have presented against naturalistic theories that rely solely on information, it is interesting to reflect more generally on cases in which a naturalistic theory of mental content fails to accommodate a scientific attribution of mental content (which is explanatory successful). Is this a good reason to reject the theory? Or, if the theory were otherwise satisfactory, should we reject the scientific attribution?

    My guess is that these questions cannot be replied in the abstract and that a particular analysis of cases will be required.

Comments are closed.