The “Twin Earth” philosophical thought experiment has importantly influenced the study of psychological essentialism. The standard philosophical intuition about that thought experiment suggests that it is an entity’s deeper causal properties—and not its superficial features—that are criterial for categorization. Four studies suggest that people do not share this intuition. Instead, people have two distinct criteria for category membership, one based on superficial features and one based on deeper causal properties. Studies 1a and 1b show that people reject the standard Twin Earth intuition, instead endorsing two (opposing) criteria for category membership. Study 2 shows that contextual cues affect categorization of entities in Twin Earth cases. Studies 3a and 3b extend these findings by looking both to a real-world case involving genetically modified organisms and a population of graduate students from elite universities. Together, these studies provide an enriched understanding of essentialized concepts. People do not endorse the standard Twin Earth intuition, categorizing entities solely on the basis of their deep causal properties; instead, people employ two sets of criteria in natural kind categorization.
Target Presentation from Kevin Tobia, George Newman, and Joshua Knobe
Imagine a liquid that is identical to H2O in terms of its superficial features. It has the same color, texture, and taste of H2O and it quenches thirst in just the same way. However, this liquid is completely different in terms of its deeper causal properties. When chemists examine it, they find that it is not composed of H2O but of some very different chemical compound. Is this liquid water?
This “Twin Earth” thought experiment was first introduced in the philosophical literature. Within that literature, the standard answer is no, the entity is not water (e.g. Putnam, 1975; Kripke, 1980). In other words, the standard philosophical view is that when it comes to natural kind concepts like this one, it is not the superficial features but the deeper causal properties that are criterial for category membership.
Cognitive scientists have since drawn on this style of thought experiment to explore certain aspects of psychological essentialism (Carey, 1985; Gelman, 2003; Keil, 1989; Wellman & Gelman, 1988) To the extent that theories of psychological essentialism are designed to capture the standard philosophical intuition, these theories should say that when people essentialize a category, they regard the deeper causal properties as criterial for category membership.
In the present studies, we show that when reasoning about such cases, people do not endorse the standard philosophical intuition. Instead, people assent to two distinct claims:
- There is a sense in which the liquid is water.
- Ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid is not truly water at all.
In light of this finding, we suggest that theories of psychological essentialism should not be designed to capture the standard philosophical intuition. Instead, the data provide evidence for a more complex account according to which people associate natural kinds with two different sets of criteria. One set of criteria is based on superficial features, the other on deeper causal properties. The complex and ambivalent reaction people have to Twin Earth cases arises from a conflict between these two sets of criteria.
1.1 Psychological Essentialism
Psychological essentialism is perhaps best conceived of as a tendency to associate concepts with a particular type of representation. When people essentialize a concept, they associate it with two distinct sets of properties: (1) a variety of observable features and (2) a deeper, unobserved essence.
Consider the concept water. This concept is associated with certain observable features (colorless, tasteless, drinkable, etc.), and people could, in principle, represent the concept water solely in terms of these features. However, a vast body of evidence suggests that adults and even young children represent concepts like this one not only in terms of superficial features but also in terms of certain deeper causal properties (Gelman, 2003; Keil, 1989; Medin & Ortony, 1989). Thus, adults and even young children seem to represent the concept water not only in terms of its superficial features (e.g. color, taste) but also in terms of a deeper property that causes or otherwise explains these superficial features (e.g., the chemical structure H2O). This deeper causal property can be understood as the essence of the kind.
Research in psychological essentialism has explored people’s representations of superficial futures, deeper causal properties, and the relationship between two (e.g., Ahn, 2001; Kalish, 1995; Keil, 1989; Murphy & Medin, 1985; Rehder & Hastie, 2001; Rheder & Burnett, 2005; Rips, 1989; Sloman, Love, & Ahn, 1998; but see Strevens, 2000). These representations have been explored in development (Hall et al., 2003; Hirschfeld, 1995; Hirschfeld, 1996; Diesendruck, 2001; Gelman & Markman, 1987; Gelman & Wellman, 1991; Kalish & Gelman, 1992; Newman & Keil, 2008), across a number of different cultures (Inagaki & Hatano, 2002; Waxman, Medin & Ross, 2007), and for a wide-range of concepts including biological and naturally-occurring categories, social categories (Haslam et al., 2000; Haslam, Bastian & Bissett, 2004; Keller, 2005; Bastian & Haslam, 2006; Haslam & Levy, 2006; Rhodes, Leslie & Tworek, 2012), representations of individuals (Schlegel & Hicks, 2011; Schlegel et al., 2013), and even mental entities (Haslam 2000; Haslam & Ernst 2002).
1.2 The One-Criterion View of Essentialism
Although essentialism is wide-ranging in its effects, we focus in particular on its impact on categorization judgments. Given that people associate natural kind concepts with both superficial features and deeper causal properties, what representation do they use in determining whether a given entity actually counts as a member of a kind?
One view would be that people’s criteria for membership in natural kinds are based solely on deeper causal properties. On this view, when people are trying to determine whether a liquid counts as water, their judgments are based entirely on its underlying causal structure (e.g., H2O). We refer to this hypothesis as the one-criterion view.
Existing work consistently finds that deeper causal properties do play a role in categorization judgments (e.g., Keil, 1989; Gelman, 2003). This is a striking phenomenon in itself, but the one-criterion view involves a further claim. It suggests not only that the deeper causal properties play a role in people’s criteria for category membership but also that the superficial features (e.g., for water, being colorless, tasteless, potable, etc.) do not play a role.
The standard philosophical intuition in Twin Earth cases seems to provide strong evidence for the one-criterion view. The entity in the thought experiment does not have the deeper causal properties associated with water, but it does have all of the superficial features. Thus, if people do in fact have the intuition that this entity is not water, this intuition would provide strong reason to adopt the one-criterion view.
Strikingly, however, existing experimental work suggests that people do not actually have the standard philosophical intuition about Twin Earth cases. Malt (1994) demonstrates that the presence of H2O in a liquid is not the only feature that plays a role in categorization as water. For example, compare Malt’s results for tea (judged 91% H2O, but presumably not “water”) and salt water (83% H2O, but presumably “water”) (but see Abbott 1997; Ahn et al. 2000). Further, in unpublished data Corcoran (2016) presented participants with a series of Twin Earth cases. In each case, an entity was described as having all of the superficial features associated with a natural kind, but being fundamentally different in its deeper causal properties. Participants were asked to agree or disagree with a statement about category membership (e.g. “The liquid is water”). Mean responses fell towards the middle of the rating scale, indicating that people did not have a strong opinion either way as to whether the entity was a category member. This result appears to provide further evidence against the one-criterion view.
1.3 The Dual-Character View of Essentialism
An alternative to the one-criterion view is what he will call the dual character view. On this view, each natural kind concept is associated with two different sets of criteria. One set of criteria is based on superficial features; the other is based on deeper causal properties. Then people can make categorization judgments using either set of criteria. Thus, if a single entity fulfills one set of criteria but not the other, people should have the intuition that the entity is a category member in one sense but is not a category member in another sense.
This hypothesis may be clarified by analogy to certain sorts of value-based concepts (Knobe, Prasada & Newman 2013). For example, consider a person who creates paintings for a living but who has no real interest in creating work of deep aesthetic value and is simply trying to make money. When evaluating such a person, participants agree that both:
- There is a sense in which this person is an artist.
- Ultimately when you think about what it really means to be an artist, you would have to say that this person is not truly an artist.
This result suggests that the concept artist is actually associated with two different criteria. It is associated with certain relatively superficial criteria (e.g., having a particular sort of job), but it is associated with certain deeper criteria (realizing a specific sort of aesthetic value). When a person fulfills one of these sets of criteria but not the other, participants tend to have a characteristically ambivalent reaction.
The dual character hypothesis predicts that a similar phenomenon should arise for natural kind concepts. On this hypothesis, each natural kind concept is associated with two different sets of criteria. If a single entity fulfills one set of criteria but not the other, participants should display a characteristically ambivalent reaction. They should say that the entity is a member of that kind in one sense but not in another sense.
Note that dual character is very different from the familiar notion of graded membership. In a typical case of graded membership, people have a single set of criteria that integrates or brings together a wide variety of different features. Then, if an entity has some of the features but not others, people conclude that this entity is a category member to a particular degree. For example, if an entity that has some features associated with the concept chair but lacks other such features, we might conclude ‘This is a borderline case of a chair’ or, more colloquially, ‘This is sort of a chair.’ By contrast, for dual character concepts, there are two distinct criteria of membership. Thus, a single entity can completely satisfy one set of criteria but also completely fail to satisfy the other set of criteria. In such a case, we might conclude ‘There is a sense in which this is clearly a category member,’ but at the same time ‘There is another sense in which this is clearly not a category member.’
This framework makes it possible to formulate a specific hypothesis about natural kind concepts. The hypothesis is not that we have one criterion informed by both deep causal properties and superficial features. Instead, it is that we have two distinct sets of criteria, one based on deep causal properties, the other based on superficial features.
1.4 Dual Character and Situational Context
In most ordinary situations, it is not feasible to express a complex state of ambivalence about whether a given entity falls under a category. People therefore need to have some way of making a single overall determination as to whether the entity falls under the category or not. A question now arises as to how people ordinarily do this.
On the dual character view, people’s basic capacity for psychological essentialism does not give them a single criterion for category membership. Rather, psychological essentialism simply makes available two different criteria, without privileging either one over the other. To the extent that people are able to pick out just one of these criteria, they will have to make use of some other psychological process.
We hypothesize that people select among these different criteria by looking to cues from the situational context. In some contexts, it seems clear that the most relevant thing to consider is the deeper causal properties (e.g., when having a discussion in a chemistry class). In others, it seems clear that the most relevant thing is the superficial features (e.g., when talking about how to resolve a straightforward practical problem). Perhaps, then, people are able to respond flexibly in a way that takes into account these contextual cues. Since they have two different criteria for category membership, they rely in any given context on the criteria that seem most appropriate to that context.
Numerous different contextual cues could play a role here, but the present studies focus on one cue in particular. A large body of research has shown that formal education in science can have a substantial impact on people’s judgments (Casler & Keleman 2008; Chi 1981, Shtulman 2006; Shtulman & Valcarcei 2012). A similar phenomenon might be at work here. People might acquire an understanding of a specific type of social context – namely, the context of scientific conversation. They learn that in that specific type of context, one should privilege the deeper causal properties, treating those properties as the sole criteria for category membership.
If this hypothesis is correct, then in the specific case where people see themselves as embedded in a scientific context, they should make category membership judgments that fit the predictions of the one criteria view. However, it should be possible using other methods to see clearly that their concepts show dual character. In particular, (a) when given an opportunity to clarify, people should say that a single entity can be a category member in one sense but not in another and (b) when they are outside of a specifically scientific context, they should be more drawn to rely on other criteria.
2. The Present Studies
In two of the studies reported here, participants received the case of Twin Earth water (Putnam, 1975) and closely parallel cases involving tigers and gold (Kripke, 1972/1980). To eliminate researcher degrees of freedom, we did not write the vignettes describing these cases ourselves but instead used the exact wording of the materials from Corcoran (2016). These materials were originally designed for a different experimental purpose, without knowledge of our present hypothesis. Thus, the present studies use the exact cases first introduced to support a view that is opposed to ours and also use a way of writing out those cases introduced by another researcher who was not aware of our hypothesis.
Experiments 1a and 1b show that most participants reject the standard philosophical intuition about Twin Earth cases, preferring instead to assent to the claim that there is one sense in which the entity is a member of the natural kind category, but also another sense in which it is not. Experiment 2 shows that categorization is affected by situational context, with participants being more influenced by deeper causal properties when they are in a specifically scientific context. Experiments 3a and 3b extend these findings by replacing philosophical Twin Earth cases with a real-world example (about genetically modified organisms) and also by looking a more highly educated population (graduate students at elite universities).
2.1 Experiment 1
Participants were presented with different versions of ‘Twin Earth’ scenarios. Specifically, all participants read about entities that had a different underlying essence from the relevant natural kind (e.g., a liquid with a different chemical structure than water). In one condition the entity had all of the same superficial characteristics as the natural kind, while in the other condition the entity had different superficial properties.
If the one-criterion view is correct, there should be no difference in ratings between conditions; in both cases, participants should say that the entity is not a member of the natural kind. By contrast if the dual-character view is correct, participants should show a more complex pattern of judgments. When the entity has different superficial properties, participants should be more inclined to say that the entity is not a member of the natural kind in any sense. When the entity has the same superficial properties, participants should be more inclined to say that the entity is a member of the kind in one sense, but is not a member in another sense.
Experiment 1a tests this prediction by presenting participants with a forced choice between statements. In Experiment 1b participants could endorse two separate statements: one affirmative (is a member of the category) and one negative (is NOT a member of the category).
2.1.1. Experiment 1a
Participants. Six-hundred participants were recruited from Amazon’s Mechanical Turk (56% male, 43% female, 0% non-binary, mean age = 34).
Materials and Procedure. Each participant was presented with one vignette presented in a 3 (Kind: gold, tigers, or water) x 2 (Vignette Structure: same appearance, different appearance) between-subjects design. In the same appearance conditions, participants read vignettes from Corcoran (2016). Those vignettes described an entity (e.g. a liquid) that had all of the same superficial properties as an entity on Earth (e.g., drinkable, clear), but a different causal property (e.g. not H2O). In the different appearance condition, participants read vignettes in which the entity did not have any of the superficial properties as an entity on Earth (e.g. is not potable and does not look like water), and also had a different causal property (see Appendix for vignettes).
Then, participants received three statements and were asked to indicate which one they agreed with most. For example, the question for water was: With which of the following do you most agree?
- The liquid from Twin Earth is water.
- The liquid from Twin Earth is not water.
- There’s a sense in which the liquid from Twin Earth is water, but ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is not truly water at all.
These three options were presented in a random order. After responding to the forced choice question, participants responded to two comprehension check questions. All vignettes and questions are listed in full in the appendix.
Three hundred and ninety-one participants correctly responded to the two comprehension check questions. Analyses were conducted on these participants. The percentage of participants choosing each option is shown in Figure 1. (Data from all experiments are available on the Open Science Framework: https://osf.io/f44hq/.)
Figure 1. Percentages of participants choosing each statement (Same Appearance vs. Different Appearance), collapsing across Kind (gold, tiger, water). Error bars indicate 95% confidence intervals.
We analyzed the data using two hierarchical binary logistic regression models. In both models, the dependent variable dichotomized participants’ responses as either “Is NOT a member” or another response (either “Is a member” or “Two Sense”). In the first model we entered as predictors Vignette Structure (different appearance vs. same appearance), and two dummy codes for the Kind (gold, water). In the second model we also included two interaction terms (gold x vignette structure, water x vignette structure). The comparison of these models indicated that Vignette Structure did not significantly interact with Kind, Χ2(2, N = 391) = 2.63, p = .269.
The results showed a main effect of Vignette Structure (B = -1.97, SE = .24, p < .001, odds ratio (OR) = 7.18, where participants were less likely to choose the “Is NOT a member” statement in the Same Appearance condition than in the Different Appearance condition.  Moreover, this pattern significantly replicated for all three Kinds (gold, Χ2(1, N = 113) = 45.04, p < .001; water, Χ2(1, N = 134) = 19.42, p < .001; tiger Χ2(1, N = 144) = 4.30, p = .038). Finally, as seen in Table 1, the Two Sense statement was the most popular response in the Same Appearance condition, while the non-member statement was the most popular in the Different Appearance condition.
|Same Appearance – Different causal structure|
|Two Sense||Is a member||Is NOT a member|
|Gold||.68 ***||.06 ***||.26|
|Tiger||.42 *||.21 *||.37|
|Water||.54 **||.21 *||.25|
|Different Appearance – Different causal structure|
|Two Sense||Is a member||Is NOT a member|
|Gold||.19*||.11 ***||.69 ***|
|Tiger||.25||0 ***||.75 ***|
|Water||.15 **||.04 ***||.81 ***|
Table 1. Percentages of participants (correctly responding to both comprehension questions) choosing each statement (Same Appearance, Different Appearance). Asterisks indicate significance via a binomial comparison to chance (.33), * p < .05, ** p < .01, *** p < .001.
2.1.2 Experiment 1b
Participants. One hundred and eighty-two participants were recruited from Amazon’s Mechanical Turk (62% male, 36% female, 2% non-binary, mean age = 34).
Materials and procedure. The design of Experiment 1b was identical to that of Experiment 1a, except that participants responded to two scaled rating questions rather than one forced choice question. For example, for water, participants were asked to rate their level of agreement with two statements:
- “There’s a sense in which the liquid from Twin Earth is water.”
- “Ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is not truly water at all.”
Participants rated both statements on a scale from 1 (disagree) to 7 (agree).
The mean response for each question across vignettes is displayed in Figure 2.
Figure 2. Mean ratings for each statement, collapsing across Kind. Error bars indicate standard error.
Results for the member statement and non-member statement were analyzed separately. For each statement we conducted a 2 (Vignette Structure: different appearance, same appearance) x 3 (Kind: gold, tiger, water) ANOVA.
For the member statement, there was a main effect of vignette structure, such that participants in the same appearance conditions agreed more strongly (M = 4.73, SD = 1.53) than participants in the different appearance conditions (M = 2.92, SD = 1.82), F(1, 180) = 51.90, p < .001, = .23. There was no effect of Kind and no interaction.
For the non-member statement, there was a main effect of vignette structure, such that same appearance condition participants agreed less strongly (M = 4.39, SD = 1.72) than participants in the different appearance condition (M = 5.48, SD = 1.6), F(1, 180) = 19.04, p < .001, = .10. There was no effect of Kind and no interaction.
To further explore participants’ responses in the same appearance condition, we then conducted one-sample t-tests comparing responses on each of the questions to the scale midpoint (4). Results indicated that ratings we significantly higher than the midpoint both on the member statement, t(91) = 4.553, p < .001, and on the non-member statement, t(91) = 2.179, p = .032. In other words, participants agreed with both the statement that there is a sense in which the entity is a member of the kind and with the statement that there is a sense in which the entity is not a member of the kind.
Two studies showed that participants categorize entities by using two sets of criteria. When an entity lacked both the underlying causal properties and superficial properties, participants were inclined to say it was not a member of the kind in any sense. When an entity lacked the underlying causal properties but shared the superficial properties, participants were inclined to say it was not a member of the kind in one sense, but was a member in one sense.
These results provide evidence against the one-criterion view, which predicts that the entity should not be seen as a member in any sense in either condition. They point instead to a dual-character view, according to which natural kind concepts are seen in terms of two sets of criteria, one involving deep, causal properties and another involving superficial properties.
2.2 Experiment 2
The results thus far suggest that people can use multiple sets of criteria to evaluate different senses of an entity’s membership into a natural kind category. We predict that which set of criteria is most relevant varies with context. Thus, when participants are forced to choose whether the entity is or is not a member, they will categorize the entity in line with the set of criteria that the context indicates as most relevant.
Experiment 2 tests whether judgment about the entity’s categorization can be directed by presenting participants with contexts that make different sets of criteria more relevant. For instance, consider the Twin Earth liquid. Would participants be less inclined to categorize it as water in the purely scientific context of a chemistry class? What about in a more practical context in which a town has a rule prohibiting residents from creating unapproved pools of water?
Participants. Four hundred and fifty-six participants were recruited from Amazon’s Mechanical Turk (62% male, 38% female, 0% non-binary, mean age = 33).
Materials and procedure. Participants received one of the Twin Earth vignettes (gold, tiger, water) and then were given information about one of three possible contexts: scientific, legal, or neutral.
Participants in the scientific context conditions were told that a science department in a university had a rule stating that all students must be provided with certain objects (gold, tigers, or water) for their science practical testing. Participants in the legal context conditions were told that a town had a rule stating that certain objects (gold, tigers, or water) cannot be used for certain purposes without approval (housing additions, pet adoption, home pool creation). Participants in the neutral context were given no information about the context (see Appendix for full materials).
In all conditions, participants were then told there is a controversy about the entity’s category members. Participants rated their agreement with a statement about category membership. For example, in the water conditions, the statement was “The liquid from Twin Earth is water” where 1 (disagree) and 7 (agree) Full vignettes and questions are listed in the appendix.
The mean ratings by condition are displayed in Figure 3. The data were analyzed using a 3 (Context: science, neutral, legal) x 3 (Kind: gold, tiger, water) ANOVA. There was a main effect of context, F(2, 455) = 9.94, p < .001, = .043. There was no effect of kind and no interaction (both Fs<1). Post-hoc Tukey’s tests showed that participants were more inclined to rate the object as a member of the category in the legal than in the science context, p < .001. Participants were also more inclined to rate the object as a member of the category in the neutral than in the science context, p = .043. The neutral and legal context ratings were not significantly different, p = .112.
Figure 3. Mean ratings of kind by context. Higher ratings indicate categorization of the particular as the kind. Error bars indicate standard error.
Participants’ judgments about category membership depended on context. In the science context, participants focus more on the causally central property and in the legal context, participants focus more on the superficial features. The neutral context was intermediate between the two.
This result provides further evidence that people have distinct sets of criteria that determine category membership. Which set of criteria is employed depends on the particular context.
2.3 Experiment 3
Studies 1 and 2 employ Twin-Earth style thought experiments. Since these have been taken as paradigmatic examples in support of a one-criterion view, these studies provide evidence against that view even in the cases introduced to support it.
Although these thought experiments are seminal examples, it might be thought that they are overly philosophical or esoteric. For this reason, Study 3 uses more realistic cases.
Finally, one might worry that the dual-character intuition arises only because participants fail to think clearly and carefully about the questions. For this reason, this final study was conducted on two different populations. Study 3a uses an online sample, while Study 3b turns to a sample of graduate students from elite universities.
2.3.1 Experiment 3a
Participants. One hundred and fifty participants were recruited from Amazon’s Mechanical Turk (57% male, 43% female, 0% non-binary, mean age = 33).
Materials and procedure. All participants read a vignette about genetically modified salmon, fish whose genes have been altered to enable them to grow at faster rates:
The Maxwell Laboratory has made great progress researching fish genetics. They have discovered how to modify the genes of salmon in order to enable the fish to grow year-round instead of only during the summer months. These genes enhance speed of growth but they do not affect any other qualities. The laboratory’s fish are identical in all other observable properties to salmon. These properties include appearance, size, taste, and other markers that distinguish salmon from other similar fish.
If one were to perform a genetic analysis of a one of the laboratory’s fish, however, one would find the fish does not contain the genes of salmon. Instead, the laboratory fish contains the modified genes.
The modified fish and salmon are completely indistinguishable and interchangeable outside of the laboratory. The laboratory issues a report stating that while the fish do not belong to the same scientific category as familiar salmon, this difference is immaterial for any purpose other than scientific classification.
Participants were randomly assigned to receive information about one of three contexts (as in Experiment 3). In the science context, participants received a story about fish used for testing in a science laboratory. In the legal context, participants received a story about fish sold at a farmer’s market. Participants in the neutral context were given no information about the context (see Appendix).
All participants rated their agreement with a category membership statement, on a scale from 1 (disagree) to 7 (agree): “The fish from the laboratory are salmon.”
The mean ratings by context are displayed in Figure 4. A one-way ANOVA found a significant effect of context, F(2, 149) = 3.36, p = .028, = .047. Post-hoc Tukey’s tests showed that participants were more inclined to rate the fish as salmon in the legal context than in the science context, p = .021. Neutral context ratings did not differ significantly from the legal context ratings, p = .259, or the science context ratings p = .505.
Figure 4. Mean ratings by context. Higher ratings indicate categorization of the entity as a member of the kind. Error bars indicate standard error.
2.3.2 Experiment 3b
Participants. One hundred and ninety-three participants were recruited from elite graduate programs (50% male, 47% female, 3% non-binary, mean age = 27). To recruit participants, we emailed department administrators from a diverse selection of graduate programs (Anthropology, Economics, Geology/Geophysics, Neuroscience/Neurobiology, Political Science/Government, Sociology, and Statistics) at elite universities (Harvard, Princeton, Stanford, and Yale University). We planned to continue emailing new departments until any round of emails brought our total participant number past 150. Our first round of emails, to seven departments at four universities, recruited 193 participants. See Table 2 for participants’ graduate degree universities and departments.
Table 2. Number of participants by graduate degree university and department. “Other” includes no response and responses that were ambiguous between categories (e.g. “Public Policy”).
Materials and procedure. The materials and procedure are identical to that described in Experiment 3a.
The mean ratings by context are displayed in Figure 4, above. A one-way ANOVA found a significant effect of context, F(2, 190) = 6.60, p = .002, = .065. Post-hoc Tukey’s tests showed that participants were more inclined to rate the fish as salmon in the legal context than in the science context, p = .001. Neutral context ratings did not differ significantly from the legal context ratings, p = .065, or the science context ratings p = .308.
Finally, we considered the online and graduate student sample together, conducting a 2 (Population: online, graduate) x 3 (Context: science, neutral, legal) ANOVA. There was a main effect of context, F(2, 337) = 9.86, p < .001, = .055. There was no main effect of population, F<1, and no interaction, F<1.
Experiment 3 examined participants’ intuitions in a more realistic scenario. Once again, there was an effect of context on participants’ judgments of category membership. In the scientific context participants were less inclined to categorize the entity as a member of the natural kind. In the legal context they were more inclined to categorize it as a member. The neutral condition was intermediate.
The results also suggest that the dual-character intuition is not simply a result of careless or inattentive thinking. The same effect arose in both the online and graduate student populations. Thus, even participants with extremely high levels of education categorize these entities in a way that is sensitive to context, in line with the dual-character prediction.
3. General Discussion
Three experiments explored the roles of superficial features and deeper causal properties in people’s categorization judgments for natural kinds. Experiments 1a and 1b looked at cases in which an entity does have the superficial features associated with a natural kind but does not have the deeper causal properties. Results indicated that in such cases people think there is one sense in which the entity is a category member and another sense in which it is not. Experiments 2 and 3 examined the impact of context. Categorization judgments for natural kinds were based more on deeper causal properties in some contexts but more on superficial features in others.
Taken together, these results suggest that people’s categorization judgments for natural kinds are not simply based on deeper causal properties. Instead, people appear to have two distinct criteria for category membership, one based on deeper causal properties, another based on superficial features. Both of these criteria appear to play important roles in people’s categorization judgments.
3.1 The Dual-Character View and Essentialism
Existing research on essentialism holds that essentialized concepts are associated with both (a) superficial features and (b) deeper causal properties. The present findings suggest that both of these elements are very clearly expressed in people’s categorization judgments. Thus, the ‘dual character’ pattern we find in people’s judgments seems to capture the distinctive nature of essentialized concepts and distinguish them from concepts of other types.
First, this pattern of judgments distinguishes essentialized concepts from concepts that are understood purely in terms of superficial features. Consider concepts like walking, motorcycle, or electro swing. Existing research has emphasized that the pattern of judgments observed for these other concepts is quite different from the one observed for essentialized concepts (Keil, 1989; Gelman, 2003). It might be thought that superficial features matter only for those other concepts and not for essentialized concepts. However, the current findings shed light on how to better understand this difference. For those other concepts, only superficial features matter, but for essentialized concepts both the superficial features and the deep causal properties matter.
Second, the findings indicate how essentialized concepts differ from those understood solely in terms of deep causal properties. For a simple example consider the contrast between the concepts h2o and water. All that is relevant to the categorization of h2o is deeper, causally-central properties. Changes in the appearance of some liquid do not affect whether or not it is h2o. By contrast, water appears to be a more complex concept; a liquid’s categorization as water depends on both its causal features and its superficial features. Similar remarks apply to the various other concepts people acquire through scientific education (proton, rabies, light speed, etc.). Future research could examine these concepts, but it seems that they might not show the dual character found for people’s ordinary essentialized concepts.
Third, the results may have implications for debates about the format of conceptual representation (e.g. prototypes vs. exemplars vs. theories). In particular, some researchers have suggested that the very same category might be associated with representations in more than one of these formats, so that, e.g., the category water could be associated both with a prototype and with a theory (Machery, 2009; Weiskopf, 2009). The present results might provide at least some support for a view along these lines (see also Genone & Lombrozo 2012; Nichols, Pinillos & Mallon 2016). That is, if people associate each natural kind both with a set of superficial features and with a deeper causal essence, it might be that these two representations actually have two different formats.
Finally, our results shed light on the relation between natural kind concepts and more value-laden concepts like artist. Existing studies find a pattern for value-laden concepts that is strikingly similar to the one we find here for natural kind concepts. For example, participants think it makes sense to say: ‘There’s a sense in which she is clearly an artist, but ultimately, if you think about what it really means to be an artist, you’d have to say that there is a sense in which she is not an artist at all’ (Knobe, Prasada, and Newman, 2013). This pattern of judgment seems highly analogous to the one obtained here for concepts like water and tiger.
A question now arises as to how to understand this similarity. One possible view would be that the two kinds of concepts are completely different in their structure but simply happen to elicit this same pattern of judgment in certain cases. Another possible view would be that this pattern of judgment is pointing to some deeper similarity in the concepts themselves. For example, it might be thought that the pattern of judgments we find for value-laden concepts is an indication that people are actually essentializing those concepts. Future research could explore this issue more directly.
3.2 The Impact of Situational Context
Studies 2, 3a and 3b suggest that situational context plays a role in determining which criteria people use when applying natural kind concepts. Across all three studies, we observed the same basic pattern. In “scientific” contexts, people were more inclined to use criteria based on deeper, causal properties. In “legal” contexts, they were more inclined to use criteria based on superficial features. Neutral contexts were intermediate between these other two conditions.
Presumably, most of the cognitive processes underlying the context effect observed here will not be specific to this one particular type of case. That is, it seems unlikely that there will be cognitive processes that are devoted solely to the use of situational context in determining the application of natural kind concepts. Rather, there appear to be more general processes that people use when looking to situational context for clues about how to deploy concepts (see, e.g., Sperber & Wilson, 1986; Preyer & Peter, 2005). One strategy for coming to a better understanding of the effect observed for natural kind concepts would therefore be just to focus on the study of these more general processes (see also Nichols, Pinillos & Mallon 2016).
However, there does seem to be at least one factor that is especially relevant to the case of natural kind concepts in particular. The present studies suggest that people’s use of these concepts shifts in certain ways when they are in a scientific context. Prior work has shown that scientific education has numerous important effects on people’s judgments (Casler & Keleman 2008; Chi 1981; Shtulman 2006; Shtulman & Valcarcei 2012). Much of this research has been concerned with the ways in which scientific education can teach people facts about the world, but it seems that such education can also teach people about the norms governing scientific inquiry. Thus, people who have even a passing acquaintance with science may recognize that there is a specific type of context — the scientific context — in which distinctive norms apply.
Perhaps one such norm is concerned with the relevance of superficial features vs. deeper causal properties. To the extent that people see themselves as embedded in a specifically scientific context, they may feel that they are supposed to focus more on the deeper causal properties than they would in other, more ordinary contexts. For example, suppose that a person believes that spiders are similar to insects in their superficial features but completely different in their deeper causal properties. Such a person might treat spiders and insects as similar in many contexts, but to the extent that she is engaged in a specifically scientific conversation, she might feel that she should begin treating them as completely different. Future studies could further explore this phenomenon and also the more general question as to how the norms governing scientific contexts might influence people’s application of concepts.
Although these experiments focused on “science” and “legal” contexts, future work might further investigate the nature of these contexts. Perhaps there may even be certain “scientific” situations in which the superficial features of an entity are most relevant to its categorization; similarly, there might be certain “legal” situations in which an entity’s deep causal property is more relevant. Studies 2, 3a and 3b show more generally that situational context can play a role in determining which sense of natural kind category membership is more salient in categorization. Future work can investigate the way in which specific contexts (e.g. science contexts) might emphasize causal or superficial features.
Finally, it should be emphasized that situational context is unlikely to be the only factor that plays such a role here. The core claim is just that people’s psychological essentialism gives them two different criteria for natural kind concepts, without privileging either over the other, and some further factor therefore has to determine which set of criteria people use in any given case. One such factor is situational context. Future work could ask whether people’s choice of criteria can also be impacted by other, unrelated factors, such as motivational biases or stable individual differences in cognitive style.
3.3. The Twin Earth Thought Experiment
Our primary concern has been with general questions about categorization judgments involving essentialized concepts. However, it is also noteworthy that the present studies explored people’s intuitions regarding the Twin Earth thought experiment. This thought experiment has received a truly enormous amount of attention within existing research: the four-hundred page “The Twin Earth Chronicles” (1996, xi) celebrated twenty years of “Twin Earth and its implications,” and in the twenty years since “Twin Earth” has received thousands more citations. The patterns of people’s intuitions concerning Twin Earth are therefore of some interest in themselves.
As we mentioned at the outset, one irony of the present studies is that they provide evidence against the one-criterion view by exploring the very case that was most often used to support that view. In other words, these studies suggest that people’s intuitions about the Twin Earth thought experiment are very different from what they had been assumed to be within prior research. But this finding immediately leads to a new question. Given that these experimental studies suggest that people’s intuitions do not actually conform to the one-criterion view, why did researchers initially assume that they did?
One possible answer is that there was never good reason for this belief in the first place. Perhaps the widespread assumption is simply to be explained in terms of some quirk of academic sociology. Cummins (1998) has argued forcefully for precisely such a claim. As he puts it: “It is a commonplace for researchers in the Theory of Content to proceed as if the relevant intuitions [about Twin Earth] were undisputed… The Putnamian take on these cases is widely enough shared to allow for a range of thriving intramural sports among believers. Those who do not share the intuitions are simply not invited to the games” (Cummins, 1998: 116).
There may be some truth to this suspicion, but we suspect that there is an additional factor at play. Experiments 3 and 4 indicate that people’s intuitions in these cases depend on context. In ‘scientific’ contexts, people tend to focus more on deeper causal properties, whereas in ‘legal’ contexts, they tend to focus more on superficial features. Perhaps the researchers who were investigating questions about the Twin Earth thought experiment were always considering that thought experiment in more scientific contexts. For that reason, it might consistently have appeared to those researchers that the categorization criteria in that thought experiment were simply a matter of deeper causal properties. This would be an understandable conclusion, but as the present studies indicate, people’s ordinary categorization judgments show a more complex pattern.
Finally, it should be noted that the study of intuitions about this thought experiment can potentially have implications that go beyond cognitive science for philosophical questions about the semantics of natural kind terms. Some philosophers have argued that empirical facts about the patterns of people’s intuitions are relevant to these questions (e.g., Corcoran, 2016); others have argued that they are not (e.g., Deutsch, 2015). Future philosophical research could return to these issues and ask whether they can be informed in any way by the present findings.
The Twin Earth thought experiment has shaped the modern study of essentialism. By distinguishing an entity’s superficial features from its deeper causal properties, it led to the core insight that natural kinds are associated with two different representations: a set of superficial features (e.g., a liquid’s color or smell) and a set of deeper, causal properties (e.g., a liquid’s underlying chemical structure) (Keil 1989; Gelman 2003).
The present studies suggest that people’s ordinary judgments do not conform to the standard philosophical intuition that the deeper causal properties are the sole criterion of category membership. Instead, we find that people’s actual judgments display a more complex pattern. Entities are categorized into natural kinds according to two different criteria. According to one, the Twin Earth liquid really is water, but according to the other, it is not water at all.
Ultimately, these results suggest that the patterns of people’s categorization judgments directly reflect the core insight that motivated essentialism research in the first place. What is most striking about essentialized concepts is that they appear to be associated with two different representations. The present results suggest that both of these representations actually play a role in people’s categorization judgments.
Abbott, B. 1997. A Note on the Nature of “Water.” Mind 106(422): 311-319.
Ahn, W. 1998. Why are different features central for natural kinds and artifacts? Cognition 69: 135-178.
Ahn, W., Kalish, C., Gelman, A., Medin, D. L., Luhmann, C., Atran, S., Coley, J.D., Shafto, P. (2001). Why essences are essential to the psychology of concepts. Cognition 82: 59-69.
Bastina, B., and Haslam, N. 2006. Psychological Essentialism and Stereotype Endorsement. Journal of Experimental Social Psychology, 228-235.
Carey, S. 1985. Conceptual Development tin Childhood. Cambridge: MIT Press.
Casler, K. and Keleman, D. 2008. Developmental Continuity in Telo-Functional Explanation: Reasoning about Nature Among Romanian Romani Adults. Journal of Cognition and Development, 9(3).
Chi, M.T.H, Feltovich, P.J., and Glaser, R. 1981. Categorization and representation of physics problems by experts and novices. Cognitive Science, 5(2): 121-152.
Cummins, R. (1998). Reflection on Reflective Equilibrium, in DePaul, M. and Ramsey, W. (eds.) Rethinking Intuition. Oxford: Rowman & Littlefield Publishers, Inc.
Deutsch, M. 2015. The Myth of the Intuitive: Experimental Philosophy and Philosophical Method.
Diesendruck, G. 2001. Essentialism in Brazilian Children’s Extensions of Animal Names. Developmental Psychology, 37(1): 49-60.
Gelman, S.A. 2003. The Essential Child: Origins of Essentialism in Everyday Thought. Oxford University Press.
Gelman, S.A. and Wellman, H.M. 1991. Insides and essences: Early understandings of the non-obvious. Cognition, 38(3): 213-244.
Gelman, S.A., and Markman, E.M. 1987. Young Children’s Inductions from Natural Kinds: The Role of Categories and Appearances. Child Development, 58(6): 1532-1541.
Genone, J. & Lombrozo, T. 2012. Concept possession, experimental semantics, and hybrid theories of reference. Philosophical Psychology 25(5): 717-742.
Hall, D.G., Waxman, S.R., Bredart, S. & Nicolay, A.C. 2003. Preschoolers’ use of form class cues to learn descriptive proper names. Child Development 74, 1547-1560.
Haslam, N. and Levy, S. 2006. Essentialist Beliefs about Homosexuality: Structure and Implications for Prejudice. Personality and Social Psychology Bulletin, 32(4): 471-485.
Haslam, N. Psychiatric categories as natural kinds: essentialist thinking about mental disorder. Social Research: 1031-1058.
Haslam, N., and Ernst, D. 2002. Essentialist Beliefs about Mental Disorders. Journal of Social and Clinical Psychology, 21(6): 628-644.
Haslam, N., Bastian, B, and Bisset, M. Essentialist beliefs about personality and their implications. Personality and Social Psychology Bulletin, 30(12): 1661-1673.
Haslam, N., Rothschild, L, and Ernest, D. Essentialist beliefs about social categories. British Journal of Social Psychology.
Hirschfeld, L.A. 1995. Do children have a theory of race? Cognition, 54(2): 209-252.
Hirschfeld, L.A. 1996. Race in the Making. MIT Press.
Ingaki, K., and Hatano, G. 2002. Young children’s naïve thinking about the biological world. Psychological Press: New York.
Kalish, C.W. 1995. Essentialism and graded membership in animal and artifact categories. Memory & Cognition, 23(3): 335-353.
Kalish, C.W., and Gelman, S.A. 1992. On Wooden Pillows: Multiple Classifications and Children’s Category-based Inductions. Child Development, 63(6): 1536-1557.
Keil, F. 1989. Concepts, Kinds, and Cognitive Development. Cambridge: MIT Press.
Keller, J. 2005. In genes we trust: The biological component of psychological essentialism and its relationship to mechanisms of motivated social cognition. Journal of Personality and Social Psychology.
Knobe, J., Prasada, S., and Newman, G.E. 2013. Dual character concepts and the normative dimension of conceptual representation. Cognition, 127(2): 242-257.
Kripke, S. 1972/1980. Naming and necessity. Cambridge: Harvard University Press.
Machery, E. 2009. Doing without concepts. Oxford University Press.
Malt, B.C. 1994. Water is not H2O. Cognitive Psychology, 27(1): 41-70.
Medin, D. and Ortony, A. 1989. “Psychological essentialism” in S. Vosniadou and A. Ortony (eds.) Similarity and Analogical Reasoning.
Murphy, G.L. and Medin, D.L. 1985. The role of theories in conceptual coherence. Psychological Review, 92(3): 289-316.
Newman, G.E., and Keil, F.C. 2008. Where is the essence? Developmental shifts in children’s beliefs about internal features.
Nichols, S., Pinillos, N.A. & Mallon, R. 2016. Ambiguous Reference. Mind.
Pessin, A. and Goldberg, S. (eds). 1996. The Twin Earth Chronicles: Twenty Years of Reflection on Hilary Putnam’s “The Meaning of ‘Meaning.’”
Preyer, G. and Peter, G. (eds.). 2005. Contextualism in Philosophy: Knowledge, Meaning and Truth. Oxford University Press.
Putnam, H. 1975. The meaning of meaning. Minnesota Studies in the Philosophy of Science: Language, Mind, and Knowledge, 7: 131-93.
Rehder, B., and Hastie, R. 2001. Causal knowledge and categories: The effects of causal beliefs on categorization, induction, and similarity. Journal of Experimental Psychology: General, 130(3): 323-360.
Rheder, B, and Burnett, R.C. 2005. Feature inference and the causal structure of categories. Cognitive Psychology, 50(3): 264-314.
Rhodes, M.J., Leslie, S.J. & Tworek, C.M. 2012. Cultural transmission of social essentialism. Proceedings of the National Academy of the Sciences, 109(34): 13526-13531.
Rips, L.J. 1989. “Similarity, typicality, and categorization,” in S. Vosniadou and A. Ortony (eds.) Similarity and Analogical Reasoning.
Schlegel, R. J., Hicks, J. A., Davis, W. E., Hirsch, K. A., and Smith, C. M. 2013. The dynamic interplay between perceived true self-knowledge and decision satisfaction. Journal of Personality and Social Psychology, 104(3), 542–558.
Schlegel, R.J. and Hicks, J.A. 2011. The true self and psychological health: Emerging evidence and future directions. Social and Personality Psychology Compass, 5(12): 989-1003.
Schtulman, A. 2006. Qualitative differences between naïve and scientific theories of evolution. Cognitive Psychology 52(2): 170-194.
Shtulman, A. and Valcarcel, J. 2012. Scientific knowledge suppresses but does not supplant earlier intuitions. Cognition 124(2): 209-215.
Sloman, S.A., Love, B.C., and Ahn, W. 1998. Feature centrality and conceptual coherence. Cognitive Science, 22(2): 189-228.
Sperber, D. and Wilson, D. 1986. Relevance: Communication and Cognition. Oxford.
Strevens, M.. 2000. The essentialist aspect of naïve theories. Cognition, 74(2): 149-175.
Waxman, S., Medin, D., and Ross, N. 2007. Folkbiological reasoning from a cross-cultural developmental perspective: Early essentialist notions are shaped by cultural beliefs. Developmental Psychology, 43(2): 294-308.
Weiskopf, D.A. 2009. The plurality of concepts. Synthese, 169: 145.
Wellman, H.M., and Gelman, S.A. 1988. “Children’s understanding of the nonobvious,” in R.J. Sternberg (ed.) Advances in the psychology of human intelligence: 99-135.
Appendix: Experimental Materials
Introduction to all experiments
All participants received this introduction: Different participants in this study will receive different scenarios. For some of these scenarios, answers to the questions are really obvious; for others the questions we ask are more difficult. Just tell us what you think about the scenario you read, whether your answers to the questions seem very obvious or not obvious at all.
Different Appearance Condition Vignettes (Experiment 1)
Gold: The Maxwell Mining Company discovers how to mine for metals on asteroids. It recovers a large amount of metal that tests show is different in all observable properties to the paradigm sample of gold stored as reference M17 in the Paris Department of Precious Metals. These properties include appearance, weight, conductivity, melting point and other markers that distinguish gold from lookalikes. When they perform a chemical analysis of the asteroid’s metal, they find out that it does not contain any elemental atoms. The metal from the asteroid is entirely composed of compound molecules. In contrast, scientists long ago discovered that all the samples of gold on Earth are composed of atoms with 79 protons. [Scientists named the element having atomic weight 79 ‘Au’.] Scientists theorize (correctly) that some compound molecules and atoms will never behave in exactly the same way. This means that they will never be completely indistinguishable and interchangeable outside the lab. Scientists issue a report stating that the pieces of metal that are not at all identical to reference M17 in any observable properties also do not all belong to the same scientific category.
Water: Suppose that in a few years, humans are able to travel to other galaxies. While exploring, they land on a planet that looks nothing like Earth in virtually any respect. It is populated by plants and animals that look totally different from the familiar plants and animals on Earth. Its landscapes and ecosystems look and function totally differently from those on Earth. They dub this planet “Twin Earth”. The astronauts remove their helmets and find that they can breathe freely. They drink a liquid not found in any of the planet’s lakes and rivers and find the liquid does not look and taste at all like water. They do not at all quench their thirst on the liquid they collect while they explore the planet. When they perform a chemical analysis of this liquid, they find out that it does not contain any compound molecules. The liquid in Twin Earth’s lakes and rivers is entirely composed of elemental atoms. In contrast, scientists long ago discovered that all the samples of water on Earth are composed of a particular compound. [That compound was named ‘H2O’.] Scientists theorize (correctly) that some compound molecules and atoms will never behave in exactly the same way. This means that they will never be completely indistinguishable and interchangeable outside the lab. Scientists issue a report stating that the liquid in Twin Earth’s lakes and rivers is not at all identical to the liquid in Earth’s lakes and rivers in any observable properties and also does not belong to the same scientific category.
Tiger: Explorers in the mountains of Asia come across a population of animals with no feline characteristics and without any striped orange and black fur. These animals they came across are not 600 pound carnivores and have none of the characteristics of familiar tigers, showing neither the same structural and functional features inside and out. Scientists study the genes of these animals and find that they do not belong to any of the known sub-species of Panthera tigris, the species to which all previously recognized tigers belong. In fact, genetic comparisons show that the new population are less closely related to the known members of Panthera tigris than are lions (members of Panthera leo). Scientists issue a report stating that convergent evolution has led this isolated population to be very different from familiar tigers. The members of the isolated population do not belong to the same scientific category as familiar tigers.
Same Appearance Vignettes (Experiment 1, 2)
Gold: The Maxwell Mining Company discovers how to mine for metals on asteroids. It recovers a large amount of metal that tests show is identical in all observable properties to the paradigm sample of gold stored as reference M17 in the Paris Department of Precious Metals. These properties include appearance, weight, conductivity, melting point and other markers that distinguish gold from lookalikes. When they perform a chemical analysis of the asteroid’s metal, they find out that it does not contain any elemental atoms. This is somewhat surprising, because scientists long ago discovered that all the samples of gold on Earth are composed of atoms with 79 protons. [Scientists named the element having atomic weight 79 ‘Au’.] The metal from the asteroid is entirely composed of compound molecules. Scientists theorize (correctly) that some compound molecules and atoms will behave in exactly the same way. This means that they will be completely indistinguishable and interchangeable outside the lab. Scientists issue a report stating that the pieces of metal identical to reference M17 in all observable properties do not all belong to the same scientific category, but this difference is immaterial for any purpose other than scientific classification.
Tiger: Explorers in the mountains of Asia come across a population of animals with feline characteristics and striped orange and black fur. These 600 pound carnivores are indistinguishable from familiar tigers, with exactly the same structural and functional features inside and out. Scientists study the genes of these animals and find that they do not belong to any of the known sub-species of Panthera tigris, the species to which all previously recognized tigers belong. In fact, genetic comparisons show that the new population are less closely related to the known members of Panthera tigris than are lions (members of Panthera leo). Scientists issue a report stating that convergent evolution has led this isolated population to be indistinguishable from familiar tigers. While the members of the isolated population do not belong to the same scientific category as familiar tigers, this difference is immaterial for any purpose other than scientific classification.
Water: Suppose that in a few years, humans are able to travel to other galaxies. While exploring, they land on a planet that looks exactly like Earth in virtually all respects. It is populated by plants and animals that look exactly like the familiar plants and animals on Earth. Its landscapes and ecosystems look and function exactly like those on Earth. They dub this planet “Twin Earth”. The astronauts remove their helmets and find that they can breathe freely. They drink from the lakes and rivers and find that their contents look and taste just like water. They quench their thirst on what they collect from the lakes and rivers while they explore the planet. When they perform a chemical analysis of this liquid, they find out that it does not contain any compound molecules. This is somewhat surprising, because scientists long ago discovered that all the samples of water on Earth are composed of a particular compound. [That compound was named ‘H2O’.] The liquid in Twin Earth’s lakes and rivers is entirely composed of elemental atoms. Scientists theorize (correctly) that some compound molecules and atoms will behave in exactly the same way. This means that they will be completely indistinguishable and interchangeable outside the lab. Scientists issue a report stating that the liquid in Twin Earth’s lakes and rivers does not belong to the same scientific category as the liquid in Earth’s lakes and rivers, but this difference is immaterial for any purpose other than scientific classification.
Experiment 1 Check Questions
Is the metal from the asteroid identical to gold in terms of all its observable properties (e.g. appearance) Yes No
Is the metal from the asteroid identical to gold in terms of its atomic/molecular structure (e.g. number of protons) Yes No
Is the liquid from Twin Earth identical to water in terms of all its observable properties (e.g. appearance) Yes No
Is the liquid from Twin Earth identical to water in terms of its atomic/molecular structure (e.g. containing H2O) Yes No
Are the animals the explorers found identical to tigers in terms of all their observable properties (e.g. appearance) Yes No
Are the animals the explorers found identical to tigers in terms of all their genetic structure (e.g. genes) Yes No[Same Appearance Condition correct response pattern is “Yes, No.”
Different Appearance Condition correct response pattern is “No, No.”]
Experiment 1a Question
With which of the following do you most agree?
The metal from the asteroid is gold.
The metal from the asteroid is not gold.
There’s a sense in which the metal from the asteroid is gold, but ultimately, if you think about what it really means to be gold, you’d have to say there’s a sense in which the metal from the asteroid is not truly gold at all.
With which of the following do you most agree?
The liquid from Twin Earth is water.
The liquid from Twin Earth is not water.
There’s a sense in which the liquid from Twin Earth is water, but ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is not truly water at all.
With which of the following do you most agree?
The animals the explorers found are tigers.
The animals the explorers found are not tigers.
There’s a sense in which the animals the explorers found are tigers, but ultimately, if you think about what it really means to be a tiger, you’d have to say there’s a sense in which the animals the explorers found are not truly tigers at all.
Experiment 1b Questions
Do you agree or disagree with the following statements?
There’s a sense in which the metal from the asteroid is gold. [1 2 3 4 5 6 7]
Ultimately, if you think about what it really means to be gold, you’d have to say there’s a
sense in which the metal from the asteroid is not truly gold at all. [1 2 3 4 5 6 7]
There’s a sense in which the liquid from Twin Earth is water.
Ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is not truly water at all.
There’s a sense in which the animals the explorers found are tigers.
Ultimately, if you think about what it really means to be a tiger, you’d have to say there’s a sense in which the animals the explorers found are not truly tigers at all.
Experiment 2 Contexts
Neutral context: There is a controversy about whether the [metal from the asteroid is gold, animal from the mountains of Asia is a tiger, liquid from Twin Earth is water].
Legal context: Imagine that the Summerville city council has just convened to establish the new town ordinances. They have thought very hard about these new rules and are now prepared to adopt them.
One of the rules says that no resident can [use gold in any housing or building additions without approval, adopt a pet tiger without approval, create a pool of water in their yard without prior approval].
One of the [Maxwell Miners, mountain explorers, astronauts who went to Twin Earth] is also a resident of Summerville. Upon returning to town, the [asteroid-miner, explorer, astronaut] takes [some of the metal that he found on the asteroid and uses it to build an addition on to his apartment, one of the animals he found in the mountains of Asia and adopts it as his pet, some of the liquid he found on Twin Earth and uses it to create a pool in his yard] without approval. There is a controversy about whether the [metal from the asteroid is gold, animal from the mountains of Asia is a tiger, liquid from Twin Earth is water].
Science context: Imagine that a group of scientists at Summerville University have just convened to establish the new rules for practical science requirements. They have thought very hard about these new rules and are now prepared to adopt them.
One of the rules says that all [engineering laboratory, biochemistry, chemistry laboratory] instructors should provide students with [a sample of gold to use, access to a tiger to study, a sample of water to use] for the students’ practical testing requirements.
One of the [Maxwell Miners, mountain explorers, astronauts who went to Twin Earth] is also [an engineering laboratory, biochemistry, chemistry laboratory] instructor at Summerville University. Upon returning to the college, the [asteroid-miner takes some of the metal that he found on the asteroid, explorer takes one of the animals he found in the mountains of Asia, astronaut takes some of the liquid that he found on Twin Earth] and provides it to his students for the students’ practical testing requirements. There is a controversy about whether the [metal from the asteroid is gold, animal from the mountains of Asia is a tiger, liquid from Twin Earth is water].
Experiment 2 Question
Do you agree or disagree with the following statement?[The metal from the asteroid is gold, The animal from the mountains of Asia is a tiger, The liquid from Twin Earth is water.]
Experiment 3 Contexts
Legal context: Imagine that the Summerville local government has just convened to establish a new tax rate of an additional 2% on certain goods sold in the town. They have thought very hard about these new rates and to what goods they apply, and they are now prepared to adopt these rules.
One of the rules says that all salmon sold are subject to the extra local tax.
One of the Maxwell laboratory workers is also a vendor at a farmer’s market in Summerville. Upon returning to town, the laboratory worker takes some of the fish from the laboratory and sells them at the farmer’s market, without collecting any amount for the extra local tax. There is a controversy about whether the fish from the laboratory are salmon.
Science context: Imagine that a group of scientists at Summerville University have just convened to establish the new rules for practical science requirements. They have thought very hard about these new rules and are now prepared to adopt them.
One of the rules says that all ichthyology (fish science) laboratory instructors should provide students with a salmon sample to use for the students’ practical testing requirements.
One of the Maxwell laboratory workers is also an ichthyology instructor at Summerville University. Upon returning to the college, the laboratory worker takes some of the fish from the laboratory and provides it to his students for the students’ practical testing requirements. There is a controversy about whether the fish from the laboratory are salmon.
Neutral context: There is a controversy about whether the fish from the laboratory are salmon.
Experiment 3 Question
Do you agree or disagree with the following statement?
The fish from the laboratory are salmon.
 We also conducted an analysis on all participants, including those who failed check questions. An inclusive analysis (excluding no participants) also reveals no difference between the two models, Χ2(2, N = 601) = 4.61, p = .100. There was also a main effect of Vignette Structure (B = -1.05, SE = .293, odd ratio (OR) = .35, p < .001).
 A Google scholar search reveals 1,130 citations to “Twin Earth” between 1973 and 1996, and 4,520 in the past twenty years (1997-2016)
Invited Comments from Jussi Haukioja (Trondheim)
Comments on “Water is and is not H2O”
Norwegian University of Science and Technology
First of all, I want to thank Tobia, Newman, and Knobe for a very stimulating paper. The new experimental results they present are highly interesting, and while I think (and will argue below) that it is premature to take their findings as conclusive evidence for a dual-criterion essentialist view, they certainly deserve closer attention, and can at the very least be used as a point of departure for future experimental work. Most of my comments below concern experiment 1. I will comment only briefly on experiments 2 and 3; however, I found the experiments on context-dependence highly interesting as well, and look forward to seeing more work on this issue.
The dual-criterion essentialist view that Tobia et al espouse claims that “each natural kind concept is associated with two different sets of criteria”, where “one set of criteria is based on superficial features; the other is based on deeper causal properties”. It should be stressed that the superficial features and deeper causal properties are not merely associated properties, since a one-criterion essentialist would agree that our natural kind concepts are associated with superficial features which we use as rough guides for applying the concept in everyday circumstances. (Indeed, Putnam is explicit about this.) The central point of the dual-criterion view is that both sets of features are just that, criteria: that they determine the extension (or, maybe, extensions) of the concept.
The result that ordinary speakers’ usage of natural kind terms displays a split pattern in Twin Earth style cases, with some usage following superficial features, some deeper causal properties, is of course well known (Braisby et al 1996 [Cognition], Jylkkä et al 2009 [Philosophical Psychology], Genone & Lombrozo 2012 [Philosophical Psychology], Nichols et al 2015 [Mind]). These results may well be taken to suggest a dual-criterion view, but as Daniel Cohnitz and I have argued at greater length elsewhere (Cohnitz & Haukioja, “Variation in Natural Kind Concepts”, forthcoming in Wikforss & Marquez, Shifting Concepts, OUP), it would be premature to simply take such results as establishing the truth of a dual-criterion view, for a number of reasons. (Our main reason for holding this is that in the absence of a well worked-out view of what distinguishes correct and incorrect applications of a term, we simply don’t know whether such results should be taken as evidence for a dual-criterion view, or rather for the view that speakers are prone to making systematic errors in their usage of natural kind terms in cases where superficial features and deeper causal properties come apart; the latter view could be adopted both by a one-criterion theorist and a descriptivist).
What makes the present study especially interesting in this regard is that, in experiment 1, the test subjects were given the option of giving an explicitly “dual-criterion” answer. In experiment 1a, one of the three options was as follows: “There’s a sense in which the liquid from Twin Earth is water, but ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is not truly water at all”. (I will follow Tobia et al in discussing the issues in terms of the Twin Earth example; similar comments apply for the cases featuring other natural kinds.) In experiment 1b, the subjects were given the opportunity to agree both with “There’s a sense in which the liquid from Twin Earth is water”, and “Ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is not truly water at all. And indeed, they found that in 1a, the majority of subjects chose the “dual” alternative, while in 1b, most subjects were willing to agree with both claims (in the same appearance / different underlying causal properties case).
This is a highly interesting result, and gives stronger support for a dual-criterion view than the previous studies. Nonetheless, I think there is a fairly plausible way for a one-criterion theorist to account for the data, and perhaps even take them as tentative evidence for a one-criterion view! In choosing the dual answer the subjects are, after all, saying that “ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is not truly water at all” (emphases added). So it seems that the subjects are affording some kind of priority to the deeper causal properties, compared to the superficial features. Of course, the very same subjects also accept that “there is a sense in which the liquid from Twin Earth is water”, but there are ways a one-criterion theorist could try to explain this away. The subjects may, in accepting this, simply be seen as reporting their view that the Twin Earth liquid can be used in much the same way as water, or something along those lines.
I am of course not suggesting that the one-criterion theorist should simply rest content with the response sketched above and dismiss the data. But the asymmetry in the dual answer does at least leave room for developing an explanation of the data that is consistent with the one-criterion view. It would be interesting to see what the results would be if the subjects were given a reversed dual option, along the following lines “There’s a sense in which the liquid from Twin Earth is not water, but ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is truly water”. If subjects, in an experiment otherwise like 1a, chose this answer as readily as they chose the dual answer in Tobia et al’s study, the support for the dual-criterion view would be considerably strengthened. Likewise, if in an experiment otherwise like 1b the subjects were asked to rate their level of agreement with “There’s a sense in which the liquid from Twin Earth is not water”, and “Ultimately, if you think about what it really means to be water, you’d have to say there’s a sense in which the liquid from Twin Earth is truly water”, and the results turned out to mirror the results found in 1b, it would be much harder for the one-criterion theorist to explain the data away. In sum: the results from experiment 1 do give stronger support for the dual-criterion view than previous studies, but I think the issue is still far from settled, and further work is needed.
If the dual-criterion view is correct, one would expect contextual cues to play a central role in determining which sense of the concept is used in making categorization judgements. Experiments 2 and 3 look at precisely this issue, finding some evidence for context-dependence. These results are quite interesting and novel. However, while statistically significant, the contextual effects found are fairly small. Moreover, since the “practical” context which prompts subjects to rely more on superficial features is a legal context, it might be tempting to explain the variation away in the following way: the test subjects might simply note that (for example) the resident creating a pool of XYZ when a pool of water would require prior approval, is clearly acting against the spirit of the law, and thereby judge that for the purposes of law enforcement we would better count XYZ (which the legislators did not know exists) as water, even though it ultimately is not water. Again, I am not claiming that we should simply rest content with the above, far from it. The results from experiments 2 and 3 are interesting and potentially important; but further work is needed to try to get clearer contextual variation, and preferably across a wider range of different contexts.
To sum up: Tobia et al clearly take a step forward in the debate, opening up possibilities for exciting new work on natural kinds terms and natural kind concepts. I want to thank the authors again for an original and thought-provoking paper; I am also very grateful to the organisers of the Minds Online conference for inviting me to comment on it. I am looking forward to seeing further experimental work by the authors on these issues.
Invited Comments from Daniel A. Weiskopf (Georgia State)
Reassembling our fragmented concepts:
Commentary on Tobia, Newman, and Knobe, “Water is and is not H2O”
Daniel A. Weiskopf
Georgia State University
I’m extremely sympathetic with the position on concepts that Tobia et al. present here, having defended a related one on a number of occasions. So my comments will be intended to do three things. First, I’ll situate their results within a larger anti-essentialist consensus in the psychology of concepts. Second, I’ll raise a question about their second criterion for classifying kinds, and whether it should really be thought of as purely appearance-based. Third, I’ll raise a problem that generally applies to contextualist theories of concepts and belief concerning the fragmentation of our representational systems.
1. Cracks in the essentialist edifice
For two decades, psychological essentialism has stood as the dominant view of conceptual structure and categorization. Its empirical pedigree, as established by researchers such as Woo-kyoung Ahn, Scott Atran, Susan Gelman, and Frank Keil, is formidable. In recent years, though, the edifice of essentialism has started to appear a bit cracked and worn. Tobia et al.’s studies land a few more well-placed chisel-blows to its foundations.
Essentialism claims that in making category membership judgments, particularly about natural kinds, people appeal most fundamentally to an object’s underlying (often unobserved or unknown) causal structure. Accordingly, discovering that something has that underlying structure should result in taking it to be a member of the kind, and discoveries that show the structure to be absent should result in non-membership verdicts. As Tobia et al. note, psychological essentialism is typically a one-criterion account (what I have elsewhere called a monolithic theory). So the extent to which people habitually use modes of classification that do not defer to essences sets sharp limits on the explanatory power of essentialism.
Criticism of essentialism comes in three forms. First there are broad philosophic critiques such as that of Strevens (2000), who argues that the evidence, plus parsimony considerations, at best support a minimalist thesis over even moderate forms of essentialism. Second are empirical critiques that show essentialism fails within some domains, such as artifact concepts (Malt & Sloman, 2007; Sloman & Malt, 2003). This is consistent with essentialism being accurate within its own proprietary areas, however. Third, there are direct empirical challenges to essentialism within domains where it is thought to be most plausible and well-supported. Tobia et al.’s studies fall into this category, along with many others (Hampton, Estes, & Simmons, 2007; Kalish, 1998, 2002).
Tobia et al. aim to establish three major claims:
(1) Categorization is guided by multiple criteria, some of which conform to psychological essentialism and others which do not.
(2) These criteria are not arbitrary, but can be activated or deactivated by specific contextual factors
(3) People will sometimes activate both criteria at once, thus evincing an ambivalent or near-contradictory attitude towards category membership
The first point bolsters the anti-essentialist trend. It also, incidentally, strikes at the foundations of metaphysical essentialism by providing further evidence that the allegedly stable intuitive judgments philosophers have appealed to are easily manipulated by task factors.
Moreover, as the second point establishes, these judgments do not fail randomly, but can be systematically reversed. People operate with multiple senses of natural kind concepts like water, gold, tiger, salmon, and presumably other species and substances. These studies therefore lend support to a view of concepts that is pluralist and contextualist. Pluralism claims that the norm is for thinkers to have and operate simultaneously with multiple differently structured concepts of a category. Contextualism claims that which of these concepts is retrieved or constructed depends systematically on cognitive and environmental factors (such as, in the current studies, the membership standards associated with legal vs. scientific decision-making).
The third point drives home that people’s attitudes towards kinds may be too complex to be captured even with a two (or more) criterion model. To see this, consider Expt. 1. In this study, participants in the Same Appearance condition favored saying that while there was a sense in which the sample (which lacked the “essential” property of the kind) is K, ultimately it is really not K. Since they had the option to express an unequivocal not-K judgment, the preference to express a more ambivalent state of mind is telling. I suspect that these qualified judgments are best thought of as expressions of in-between beliefs in the sense of Schwitzgebel (2001); see Section 3 for more on this point. However, the other experiments (Expts. 2 & 3), which used rating scales paired with unqualified membership statements, speak more directly in favor of the multi-criterion view. I’ll now turn to a question about how the criteria themselves should be characterized.
2. Other criteria: Appearances or anthropic properties?
As Tobia et al. interpret their results, they show evidence that natural kinds are sometimes categorized using superficial or appearance-based criteria. However, as written the vignettes (Expt. 1a and 1b) conflate appearance properties with at least some others. The “observable properties” of twin gold, for instance, are given as its “appearance, weight, conductivity, melting point and other markers that distinguish gold from lookalikes.” It is further stated that twin gold is “completely indistinguishable and interchangeable outside the lab” with respect to regular gold.
This list of observables is, I suggest, more accurately thought of as a mix of appearances and functional macroproperties. The reference to the fact that twin gold can play any role that regular gold can further reinforces the suggestion that these are functional similarities, not just sameness of looks. This appeal to functions recurs in the description of the legal contexts (Expt. 2), which involve human-specific activities and ends such as building housing, keeping animals as pets, and filling swimming pools.
While it might be correct to say that some of these properties involve how kinds appear, they also include aspects of their behavior, function, and suitability for playing specific roles in organized human life. It would be a misnomer to lump these together under the rubric of “superficial” or “appearance” properties. In fact, it is not clear that the information most people know about categories can be easily segregated into two piles labeled “deep, causal information” and “superficial, appearance information.” The vignettes instead seem to invite judgments based on a mixture of perceptual looks and what we might call anthropic functions: social, technological, or conventional uses that people find for aspects of the natural world. Whether something serves those functions depends on more than appearances, even if it is also (partially) independent of a substance’s specific underlying causal structure.
These materials may, then, be triggering an anthropic mode of construal. As I’ve argued elsewhere (in a paper that develops the notion of an anthropic concept in depth), these representations are widespread, particularly in communities that make use of indigenous or traditional ecological knowledge. The need to engage practically with the natural world, rather than in a mode of detached theoretical contemplation, gives rise to systems of categories that divide up plants, animals, soils, minerals, and other substances according to how they and their parts contribute to everyday goals and projects. Ethnobotanical nomenclature often classifies the parts of plants in terms of their uses as foods, medicine, and construction materials. The most prototypical examples of bird species are those that have the most prominent culture-specific symbolic and ecological roles. And the mineral world—comprising chalk, soil, and precious stones—is classified by how well it serves functions of marking surfaces and bodies, making ceramic vessels, and decoration.
All of these are functional classifications indexed to human purposes. They covary with, but are distinct from, simple appearance properties, and they often cross-classify the natural world relative to its underlying causal structure. If Tobia et al.’s results hold generally, I would expect that in participants who have this sort of systematic understanding available to them, activating a practical orientation towards nature should prompt the use of anthropic criteria of classification. This suggests that there are not two, but potentially many such modes of understanding whose relations and structure should be mapped further.
3. The fragmentation of the conceptual
Finally, pluralistic contextualism faces a significant question of how we set cognitive policies to manage these multiple perspectives, especially when they lead to seeming contradictions, or otherwise fail to converge in their outcomes. Tobia et al.’s results suggest that people will categorize something as water in one context and as not-water in another. But what, then, do they believe about water all-things-considered? What do they think water is? More generally, is there such a thing as what people believe about natural kinds tout court, considered in a way that prescinds from any specific context or perspective? Or are all such beliefs intrinsically context-bound?
These are large questions. To make a start on them, I’ll note that a related problem has elsewhere been discussed under the heading of the fragmentation of belief (Egan, 2008; Norby, 2014). Cases of fragmented belief involve thinkers who have sets of distinct, even inconsistent, beliefs about a subject that never actually come into direct conflict because there is no context in which they are all activated simultaneously. A fragmented thinker operates sometimes with one part of their overall belief set, sometimes with another. For instance, Jules may believe (1) that the railroad tracks run east/west, (2) that Nassau St. runs north/south, and (3) that Nassau St. runs parallel to the railroad tracks. However, if (1-3) are only accessed pairwise and never all at once, the contradiction in belief will never pose a practical or theoretical problem. Other fragmentation cases involve asymmetries between recognition and recall: I can recognize Val Kilmer’s callsign in Top Gun when it’s presented to me, but I can’t recall it if asked without cueing. Or, to make this a parallel case of contradiction, I might misrecall it but correctly recognize it.
Tobia et al.’s participants in Expts 2 & 3a/b are fragmented in this way: they hold (in C1) that a specific substance is water, and (in C2) that it isn’t. A measured response would be to note that these results don’t imply belief in a contradiction since it doesn’t follow that there is any situation (C3) in which they hold both that it is and isn’t water. However, earlier work by Braisby, Franks, and Hampton (1996) indicates that people sometimes do make judgments very close to this. In a within-subjects design their participants evinced contradictory kind membership judgments over 30% of the time. Moreover they were asked explicitly about whether making such contradictory judgments was problematic. Around 27% of respondents said it was not, even though they also reported trying to be consistent at least some of the time.
Consider three views about how stored mental representations are related. Isolationism holds that each judgment belongs only to its own perspective or scheme, and never interacts across perspectives. Integrationism holds that judgments belong to distinct, fragmented schemes, but are under certain conditions capable of interacting across these schemes. Unitarianism holds that these judgments all belong to a single conceptual scheme. A question for further research, then, is how much cross-contextual integration these conceptual shards have. Is there, for example, spontaneous transfer of information across contexts? If so, what are the mechanisms?
These questions bear more investigation, although some leads have been sketched in Steven Horst’s superb recent book, Cognitive Pluralism (2016). In line with the present studies, he remarks that “some concepts seem to appear in multiple models…. Concepts like fruit and vegetable seem to appear in biological, culinary, and nutritional models, but their extensions are different in the different models. A tomato is a fruit according to a biological models but a vegetable according to a culinary model” (p. 310). As he says, this raises questions about the identity conditions for concepts across models, which also arise on the many-criterion theory. What exactly makes two water-concepts identifiable if their extensions and some of their governing inferences are disjoint? These worries are not the immediate focus of Tobia et al.’s inquiry, but they are visibly looming in the background of it.
A final (albeit tentative) suggestion is that the sort of blurry, in-between belief state I suggested participants evinced in Expt. 1 might be the result of having multiple context-bound kind beliefs without any clear way to adjudicate, rank, or integrate them. These participants are in a state that is an admixture of suspended “in-a-sense” claims. This is one way–though not the only one–to believe in-betweenishly. This global state can be resolved into more determinate, less in-between states when given the right context. But since these more determinate states contradict each other, it might be hard to find language for this particular in-between attitude. Such awkward, hedged conjunctions might be the most apt expression of the trans-contextual truth about participants’ mental states.
Perhaps, then, conflicted, fragmented representations that lack any rank ordering to stabilize them can only combine into in-between states. As Horst notes, we should not force unity where it is not forthcoming: “minds like ours may be unable to produce a set of beliefs that is globally consistent or a single comprehensive model of everything without losing some of the epistemic grip on the world that we gain through many more localized, idealized models and the beliefs they license” (p. 222).
To briefly summarize:
(1) Applause for more confirmation of the pluralist, contextualist framework.
(2) More work is needed to disentangle responding based on appearances from that based on anthropic, practical or other criteria. This doesn’t undermine, but rather deepens the main argument of the paper.
(3) More needs to be done to clarify the overall belief state of the participants and the relations among their contextually restricted sources of judgment. The overall mental state of the participants is evidently a complex one. They are fragmented in a way that their responses are tailored to contextual cues, and they are also in-betweenish when considered more globally or from a perspective that tries to take into account everything that they believe about kinds. A contextualist perspective needs to make room for both of these, and in particular needs to address how information is stored and integrated so that it can be transported across perspectives.
Braisby, N., Franks, B., & Hampton, J. A. (1996). Essentialism, word use, and concepts. Cognition, 59(3), 247–274.
Egan, A. (2008). Seeing and believing: perception, belief formation and the divided mind. Philosophical Studies, 1–26.
Hampton, J. A., Estes, Z., & Simmons, S. (2007). Metamorphosis: essence, appearance, and behavior in the categorization of natural kinds. Memory & Cognition, 35(7), 1785–1800. https://doi.org/10.3758/BF03193510
Horst, S. (2016). Cognitive Pluralism. Cambridge, MA: MIT Press.
Kalish, C. W. (1998). Natural and artifactual kinds: Are children realists or relativists about categories? Developmental Psychology, 34(2), 376–91.
Kalish, C. W. (2002). Essentialist to some degree: beliefs about the structure of natural kind categories. Memory & Cognition, 30(3), 340–52.
Malt, B. C., & Sloman, S. A. (2007). Artifact categorization: The good, the bad, and the ugly. In E. Margolis & S. Laurence (Eds.), Creations of the Mind (pp. 85–123). Cambridge, MA: MIT Press.
Norby, A. (2014). Against fragmentation. Thought, 3(1), 30–38.
Schwitzgebel, E. (2001). In-between believing. Philosophical Quarterly, 51(202), 76–82.
Sloman, S. A., & Malt, B. C. (2003). Artifacts are not ascribed essences, nor are they treated as belonging to kinds. Language and Cognitive Processes, 18(5–6), 563–582.
Strevens, M. (2000). The essentialist aspect of naive theories. Cognition, 74(2), 149–75.