KEYNOTE: How to solve the paradox of spatial pitch without resorting to metaphors

Ophelia Deroy (University of London)

[Jump to comments]

Introduction: The paradox of spatial pitch

Try, for a moment, to imitate an orchestra conductor while listening to a familiar classical tune. Didn’t you find your hands moving up and down in time with the high- and low-pitched notes respectively? Or just try singing in a high and then a low pitched voice instead: now it is your eyebrows that will likely have gone up and down[1]. This spontaneous behavior might not come as a surprise: After all, a series of tones increasing in frequency are almost universally described as ‘rising’: The handful of languages, including Zapotec and Farsi, which preferentially refer to pitch as ‘thick’ and ‘thin’ still use the vertical metaphor, every now and then. An increase in pitch is not just described as rising, it is also heard as a continuous ascending movement[2]. The linguistic and perceptual connections between the notes and these spatial dimensions, however, are highly paradoxical. Musical notes are a perfect illustration of this paradox: after all, they are not played from higher or lower locations in space, and yet spontaneously interpreted through what seems, at best, a metaphor, at worse, a total inaccuracy. The tension is noted by Zuckerkandl, in his book Sound and Symbol (1956, p. 270): “On the one hand music appears as the art that…is perceived solely in and through time, to the complete exclusion of space; [while] on the other hand, it is full of phenomena that seem to presuppose a spatial order and that in any case are wholly incomprehensible if space is completely excluded”.

The “paradox of musical motion”, as it is sometimes known, has attracted the attention of many thinkers and philosophers[3] and has also been seen as a topic worthy of investigation ever since the pioneering early work of Carl Stumpf on the psychology of the tone[4]. However, far from solving the paradox of musical motion, the scientific investigation of tonal perception initiated by Stumpf has actually moved it toward different, and, as it happens, no less challenging problems: by attempting to decompose the problem into more and more specific parts, cognitive scientists have shown that the spatial interpretation initially observed for music and sequences of tones also holds for individual tones. While the musical case could be explained, in part, by a general tendency to map temporal order onto space, no such interpretation is available to explain why pure tones should also be spontaneously mapped onto space. The mapping here seems only to come from a contrast in frequency or pitch: two sounds of higher or lower frequency played from the same static location translates into a difference between ‘high’ and ‘low’[5]. And this turns out to be true from a very early age[6].

Interestingly enough, many researchers would appear to assume that the spatial character of pitch contains the key to the paradox of music motion, or at least want to see a direct link between the two problems. I am less optimistic. There is much more to music than pitch, and much more to the perceived dynamism of musical motion than merely ‘going up’ or ‘going down’ (e.g., Eitan & Granot, 2007; Walker, 1987). After all, a melody retains the same motion when played in a higher or lower key (Foster & Zatorre, 2010). The paradoxical spatialization of pitch, which is the focus here, is several steps removed from the paradox of musical motion: understanding how a sequence of stationary tones can be perceived as moving is different from asking how a dimension of sounds (pitch) can acquire certain spatial attributes which are independent, or at least so it would seem, of the spatial properties of the resonating body producing it. The paradox of musical motion is that of implied or perceived motion in the absence of any physical spatial displacement; the paradox of spatial pitch, one of spatial perception divorced from the actual location of the sound-emitting body.

As defended here and in Deroy et al. (in press), the study of spatial pitch actually ends-up increasing the gap between the two paradoxes. For the moment, though, it’s sufficient to stress that our main question concerns why pitch is interpreted or mapped spatially in perception, cognition, and action[7].

One of the problems with this question, nicely outlined by Ernst Mach in his book Analysis of Sensations, is to determine what, exactly, is meant by ‘spatially mapped’. The space onto which pitch is mapped indeed has little to do with actual space: “A tonal series occurs in something which is an analogue of space, but is a space of one dimension limited in both directions … It more resembles a vertical line” (Mach, 1959, p. 278)[8]. The spatial attributes of pitch, in other words, seem not fully spatial, if by space one means a three dimensional space. This observation leads many to take the spatial character of pitch in a metaphorical sense of fitting within a peculiar mental space, where the differences between tones are mapped. The metaphorical, or mental, interpretation of space, however dominant, is problematic. For one thing, it dissolves the paradox of spatial pitch into a much broader problem, as pitch would just be one of the many aspects of experience which we are inclined to represent on a mental line. Brightness, loudness, numerosity, weight are also equally represented on a mental line, along with other magnitudes[9]. What seems different between these spatial mappings of magnitude and the case of pitch are the extended perceptual consequences of the mapping.

If the problem we need to explain is why differences in pitch lead to biases for stimuli perceived in the ‘real’ physical space, thinking about high and low pitch as a metaphor pushes the problem further away, rather than resolving it: Having proposed that pitch is mapped onto a metaphorical space, we need to understand how this metaphorical space relates to physical space.

Turning the table

The metaphorical or mental interpretations get into problems because of the approach they rest on. They start with pitch, and its assumed lack of spatial informativeness, and try to find how space comes into the picture. The approach we recommend is almost opposite: We want to start with space and the spatial biases which can be induced by the pitch of sounds that are produced in that space. How do sounds of different pitch modulate our responses when we detect, perceive, or interact other external objects or stimuli along which these sounds are presented? Looking at how spatial capacities (such as spatial attention or spatial responses) are influenced by the pitch of the sounds produced by certain sources, avoids the conclusion that pitch needs to be mapped onto a mental space, and suggests, on the contrary, that a more direct correspondence between audition and the visual observation of sound sources might be at stake. Crucially, the biases generated by pitch might have to do with the close connection between hearing and seeing used to detect natural objects. It might then have much less to do with the human capacity to attend to sounds for the sake of it, as in the case of music. Separating the origins of spatial pitch from the question of musical motion means that we need to look differently at the origins of musical cognition.

Pitch in space

Pitch can be defined as the most salient perceptual dimension corresponding with the physical dimension of sound frequency – more specifically, the fundamental frequency of a harmonic series. For instance, the fundamental frequency of the note G4 is 392 Hz, meaning that the air vibrates 392 times a second, or once every 1/392th of a second.

The connection with the frequency of the resonating body emitting the sound means that pitch might vary with the displacement of that source. Consider here what happens when a sound source rapidly approaches and then passes a static observer (or vice versa): the frequency of the sound reaching the ear remains essentially constant at first, then falls at an increasing rate as the source gets ever closer, before dropping rapidly as the source passes. It then drops at a decreasing rate as the source recedes (e.g., Neuhoff & McBeath, 1996), the rate and magnitude of the drop being related to the distance and the speed of the source, respectively. However, as illustrated by the Doppler shift effect (Doppler, 1842), many people actually believe (incorrectly, as it turns out) that the frequency of the source increases as it approaches, in the same way that intensity increases. When a Formula 1 passes in front of the stand, the pitch of the sound is perceived as rising as the car approaches and falling as it recedes. In other words, for human perceivers, both pitch and intensity appear to follow the same time course as the source moves. This effect reveals that our interpretation of the spatial information contained in frequency changes is already biased. More importantly for present purposes, it reminds us that changes in pitch can be linked to the actual and perceived spatial fate of sound sources.

While rising and falling pitch will be interpreted as the approaching and receding movement of its source along the horizontal plane, pitch and sound sources are also linked in auditory localization tasks in the vertical plane. As binaural cues play a lesser role and spectro-temporal cues a more important one when it comes to perceiving vertical auditory motion, differences in frequency can come to play a role in the perception of moving auditory targets in the vertical plane, thanks to the filtering properties of the pinna (Shaw, 1982). More surprisingly still, people typically locate static sources of higher pitched sounds as originating from higher locations in space[10]. For instance, Pedley and Harper presented their participants with three different tone series. Two of the 7 tones presented in each of three tone series were the same (900 and 1,400 Hz). These tones appeared as the lowest, intermediate, or highest tones in each series. The participants had to assign an elevation to each tone in the series, based on 7 markers presented on the wall in front of them. While all of the sounds actually came from the same vertically-arrayed pair of loudspeakers, the participants were actually misinformed that each of the sounds came from one of 7 different loudspeakers. The height assigned to the source of the 900 and 1,400 Hz tones was significantly influenced by their relative position within their respective tone series, being associated with a lower elevation when the tones were lower in frequency relative to the rest of the series of tones. Overall, though, the increasing frequency sounds in each series were generally located at higher positions.

But whereas in the Doppler case, there is a genuine correlation between the change in frequency and the approach of the sound source (which happens to be misperceived), there would appear to be no physical grounding for a given sound source emitting higher frequency sounds if it is located higher in space.[11] So, instead of counting as a misperception of the space-pitch relation in the horizontal motion case, the vertical spatial bias generated by the pitch of sounds whose sources are static would seem to be imposing a spatial relation which is simply not there. The paradox of spatial pitch, in other words, is taken to come from those spatial effects of pitch that cannot be accounted for as a misperception of the displacement of sound sources.

Key evidence for such effects comes from experimental paradigms in which a difference in pitch, in the absence of a difference in sound source, is sufficient to influence performance on unrelated visual tasks. The details of these protocols are examined in Deroy et al. (in press) and Spence & Deroy (2013), but it is useful to give specific illustrations of these effects. For instance, if we are asked to respond as fast as possible to visual targets presented either in the lower or higher part of the screen while task-irrelevant sounds are played simultaneously, we are slower to respond to a high visual target presented with a low-pitched sound rather than a high-pitched one[12]. The same is true if sounds precede the visual targets[13], showing that pitch influences exogenous spatial attention. Our perception of an ambiguous visual motion display that can either be perceived as drifting upward versus drifting downward is biased by the simultaneous presentation of an ascending versus descending auditory frequency sweep[14]. According to a separate line of research, responses to higher pitched-sounds tend to be faster when given using a higher, rather than a lower, response button – an effect that has been labelled as the SMARC or SPARC – that is, to a Spatio-Musical (or Spatial-Pitch) Association of Responses Codes[15]  in analogy with the SNARC effect (Spatio-Numerical Association of Responses Codes), where responses to small numbers are faster on the left side and the responses to large numbers are faster on the right side[16]. In these stimulus-response compatibility effects, differences in response times are taken as evidence that the mental spatial representation of magnitude corresponds, or interferes, with the spatial distribution of response options.

There is clear empirical evidence then that people’s performance and perceptual experience can be influenced by the pitch of auditory stimuli, even when they happen to be irrelevant to the task and unrelated to the perceived object.

Revisiting the linguistic hypothesis

Given that the tight connection between pitch and vertical elevation is first observed at the linguistic and phenomenological levels, many have been led to attribute it to the influence of language on perception. The discussion of the relation between the spatial attributes of pitch and language did not wait for the Sapir-Whorf debates, asking whether the way languages described the same reality impacted the way this reality was perceived. Pitch was noticed as an example of quasi-universal linguistic agreement by Stumpf (1883), many languages uses the same words, ‘low’ and ‘high’ to describe sounds of differing pitch[17]. The fact that English remains the language in which the majority of the reflections and experiment on spatial pitch are conducted, certainly makes the uses the terms ‘high’ and ‘low’ to describe both pitch and height more salient for investigation, over and above other possible metaphors. Spatial elevation is also linked metaphorically with many other dimensions of experience and cognition, with people using the contrast between up and down to mark the difference between positive and negative affect, or to mark dominance and social power[18]. In these domains, just as for pitch, visual representations of affect or social relations will be responded to more quickly if they conform with the up-down contrast with which they are described.

The influence of language however was tested a few years ago by Parkinson, Kohler, Sievers, and Wheatley (2012) who conducted research with the Kreung, a remote hill tribe from northeastern Cambodia. This group is interesting because they are one of the very few who do not use spatial language in order to describe auditory pitch. The participants in this study viewed shapes that could either rise or fall while listening to sounds that would either increase or decrease in frequency. They were required to report on the auditory change. The critical response measure was the accuracy of responding in the congruent versus incongruent trials. Intriguingly, the Kreung participants were significantly more errors on pitch-elevation incongruent as compared to congruent trials, in a way which was similar other populations where pitch is, in fact, described in terms of spatial height. Such results support the idea that the mapping of frequency onto vertical directions does not depend on language.

The early manifestation of this mapping also suggests that language is not necessary for the association between space and pitch. Evidence in 6-month-old infants presented by Braaten (1993) at a conference was then pushed back down to 3 to 4 month olds (Walker et al. 2010), albeit not without some controversy. Jeschonek et al. (2013) and Dolscheid, Hunnius, Casasanto, and Majid (2014) also demonstrated that pre-linguistic infants (7-12 months old and 4-months-old, respectively) are sensitive to associations between space and pitch – with the dynamic version emerging earlier than the static one.

Such early manifestations of the association between pitch and space in human infancy should not, however, necessarily lead one to conclude that language or other aspects of development have no effect whatsoever: A recent series of studies by Nava, Grassi, and Turati (2016) reported that the association between pitch and elevation was still quite weak in preschool children no matter whether tested with audio-visual or audio-tactile combinations of stimuli. Nava and her colleagues speculated that this could perhaps be attributable to immature linguistic and auditory cues that are still developing at age five years of age.

Meanwhile, the results of a neuroimaging study by Sadaghiani, Maier, and Noppeney (2009) have provided additional information here. They investigated the different neural bases of the effects induced by pitch and linguistic labels. The motivation for the study was to investigate whether the biases on visual motion perception which can equally be induced by a rising or descending pitch, spatial words such as ‘LEFT’ or ‘UP’, or sounds actually moving from left to right, actually shared the same neural underpinnings. The study not only revealed that the three kinds of effects (pitch, words, and motion) operate at different levels of the cortical hierarchy, but also that the influence of linguistic signals emerged primarily in the right intraparietal sulcus while the effects of pitch could be seen both in these higher-level convergence regions and in the audiovisual motion areas (hMT+/V5+), where the effects of actual auditory motion were shown.

Amodal remapping and other conceptual hypotheses

Conceptualizing ordered relations

In recent years, the explanation of the spatial effects of pitch have been seen as an instance of a broader tendency to map sensory magnitudes, as well as time, onto space. Many sensory magnitudes, such as size, weight, brightness, visual density, etc., and symbolic numbers all tend to be spontaneously mapped spatially with smaller quantities being associated to one end, and larger to the other. In most left-to-right reading cultures, and in young infants, smaller quantities will be associated to the left side and larger ones to the right one. As this spatial mapping seems to be common to many different sensory attributes, and to symbolic numbers, it has been posited by some as playing a key role in enabling an abstract or amodal representation of quantity, independent of the kinds of objects or domains at stake[19].

Most often, in the magnitude debates, the key distinction falls between those theories which see the use of a space as essential to the general representation of sensory magnitudes[20], and those which only consider it as a convenient, but by no means indispensable, way of representing different kinds of information on a similar format[21] – like the way we use the spatial properties of graphs and diagrams to represent numerical data. What seems to be key to the present purpose is a rather orthogonal difference, having to do with the degree of connection that those theories see between the space involved in magnitudes (and, by association, for pitch) and real space.

At one end of the spectrum, Proctor and Cho (2006) consider that the spatial representation of magnitudes and pitch serves the conceptual representation of order relations. They insist, for instance, that the kind of space shown to exist for sensory magnitudes often only involves left-to-right or up-down opposites on a two-dimensional line – suggesting that space mostly acts as a practical way to represent polarity oppositions and intervals on a scale. In other words, there would be no more to the mapping of pitch contrasts into higher and lower parts of a line than an easy way to represent their difference as a ‘distance’ between values. The spatial biases noted above, in other words, would emerge as a consequence of the way in which participants represent the contrast between the different auditory cues used in the experiments.

By contrast, the general theory of magnitude proposed by Walsh (2003) suggests that the fact that we map quantities onto space is primarily meant to help keeping track of quantities or order in the abstract, but is grounded in the demands of action: The mapping of various magnitudes onto a spatial format matters to operate and co-ordinate transformations for action or predictions about the immediate sensorimotor consequences of action and is supported by evidence of a neural overlap of magnitude processing in the inferior parietal cortex[22]. Crucially, many of these mappings will reflect ecologically valid relations between time, space, magnitudes, and action: For instance, it takes more time to reach for a further object, speed of movement being fixed; an increase in brightness or loudness tends to correlate with objects coming closer in space, etc.

The general theories of magnitudes then, can be divided into two different families when it comes to pitch, depending on whether the space at stake in the mapping (be it necessary or not to the ordered representation of pitch differences) is a conceptual, abstract space, or the motor space of movement and actions. While the latter will connect the spatial mapping of pitch to constraints about the world or action, abstract theories of magnitude converge with other accounts of the space-pitch relation, which see it as an instance of a general tendency to connect domains of experience through very general abstract concepts, such as ‘intense/weak’[23]. According to these theories, all domains of experience are structured by a small number of basic oppositions, which show in our ways of relating or comparing specific experiences, and explain why we match pitch to elevation, but also brightness, shape, and other dimensions.

Is higher pitch ‘more than’ lower pitch?

Theories of magnitude which connect space and quantities have the ambition of explaining much more than the oft-documented association between pitch and space, but the question remains as to whether they do a good job of explaining this mapping in particular. One primary difficulty here might rest with the fact that pitch is not – strictly-speaking – a sensory magnitude, in the sense of being linked to the amplitude of waves. Auditory magnitude, strictly-speaking loudness, is shown to be mapped onto space, in a way which fits with the expected left-right mapping of quantities[24]. Pitch, on the other hand, corresponds to an increase in the temporal frequency, but not amplitude. It is only indirectly related to actual magnitudes, either through size of typical sources, or the integrality of dimensions.

Overall, lower pitched sounds do indeed tend to be emitted by bigger objects, and higher pitch sounds by smaller ones (although the tension of the object’s material is here the critical factor). However, this statistical regularity would predict that lower-pitched sounds would count as ‘more’, along with larger sizes, and be on the upper part of the line used to map magnitudes. Some evidence pointing in this direction comes from, Eitan et al. (2014) who have shown that the association of lower pitched sounds with larger sizes was somewhat reversed once tested dynamically, as sounds which are getting lower and lower in pitch are congruent with shrinking size (i.e., ‘less’, as seen in the spatial mapping of lower pitch with the lower part of the vertical line).

The overall interference between pitch and loudness might also explain why pitch ends-up being assimilated onto an intensive (i.e., ‘more or less’) scale: Higher-pitched sounds tend to be perceived as louder than lower-pitched sounds played at the same decibel level and come to correspond with ‘more’ intensity[25].

In both cases, though, there is a difference between the kind of immediate spatial mapping of sensory magnitudes directly related to an increase in the physical amplitude of waves (i.e., ‘more’ energy) or the actual size (‘more’ extension), and the indirect account needed to explain how pitch, which is first and foremost associated to temporal frequency, comes to be associated to force/amplitude or size.

On another point, however, neither the general theory of magnitude, nor the polarity matching account, explain why the effects present themselves along the vertical dimension, at least in a privileged manner, when most others are more robustly mapped onto the horizontal dimension instead. One needs to be careful here with the notion of a privileged mapping, given previous evidence that the directionality of mapping is heavily influenced by cultural factors and experience, and can be easily changed. Rusconi et al.’s results, suggesting the presence of an horizontal mapping of pitch in musicians but not in non-musicians may perhaps support the idea of a more preponderant mapping between pitch and spatial elevation and the possibility of other mappings (e.g., of pitch horizontally), depending on the perceiver’s experience (see also Lidji et al., 2007). Still, here, the vertical mapping of pitch differences appears as early as twelve-months of age, and even earlier for the dynamic version, and would therefore not depend on cultural factors.

If the conceptual interpretation of space-pitch seems to face difficulties, should we accept that sensory magnitudes are matched onto space for questions having to do with recoding then in a single format that is accessible for action (e.g., Walsh, 2003)? It is not, however, clear how the high-low contrast involved in pitch relates to the space for action, where, as seen earlier, changes in pitch seems rather linked to approaching/ receding stimuli. Here, we want to suggest that the solution to this question rests in looking at the collaboration between audition and vision in the localization of sound sources in external space.

The crossmodal interpretation

One alternative interpretation for the effects that have been reviewed here might be to stress not the amodal, but rather the crossmodal, character of the space that is involved in pitch. In other words, while the tracking of sound sources through audition only is unlikely to explain why higher sound sources are associated with higher frequency sounds, this link become more apparent once sound sources are considered to be tracked by vision as well as by audition.

The role of vision

Although the majority of studies have been conducted in the audiovisual domain, the basic phenomenon of pitch-space is rarely described as visuospatial: While Carnevale and Harris (2016, p. 113) assert that “Low- and high-pitched sounds are perceptually associated with low and high visuospatial elevations, respectively”, the visual character of the space involved in the testing seems to remain hidden in most studies, and philosophical discussions. Parise et al. (2014) talk, for instance, of “perceived spatial elevation” in general, and Hidaka et al. (2013) of higher-pitched sounds “being “up” in space”, without specifying the modal nature of the space involved. Peacocke (2009) or Scruton (1983) mention visual images evoked by sounds and music, but do not suggest that the space in which one needs to conceive of the elevation of pitch (or musical motion, which is their main concern) is the visual space.

A possible explanation for this absence modal specification might be the largely shared assumption that space is an ‘amodal’ representation – which is not without problems. While the spatial relations represented by each modality are certainly related in the brain and in the mind, this is insufficient to conclude that they all only pertain to a single modal space – the hypothesis of a multiplicity of spatial representations, with different degrees of connection depending on the hierarchy and motor goals, being much more likely to account for what is going on.

In this context, it is interesting to ask what kind of space is involved in the space-pitch paradox, and whether the tendency to associate higher or increasing frequency with higher location or ascending movement also exists in the audiotactile domain as well. Occelli et al. (2009) demonstrated that crossmodal congruency effects between auditory pitch (either high or low) and locations still exist even when these locations were given on the forearm stimulated via the participants’ sense of touch. Meanwhile, we recently investigated whether the crossmodal correspondence robustly documented across changes in auditory pitch and visual direction of movement has analogues in the audio-tactile domain for the sighted as well as the blind[26]. Faster responses were observed when the same response button was used to respond to one of two intuitively congruent stimuli (i.e., outward tactile movement, going from the base of the finger toward the fingertip and increasing pitch, or inward tactile movement and decreasing pitch) rather than to incongruent pairs of stimuli. Intriguingly, however, this implicit association between pitch and tactile movements was not observed in those participants who were blind, independently of the onset of their deficit.

These results have methodological implications for the explanation. Although the pitch-elevation correspondence also exists between auditory pitch and tactile location, it depends on a participant’s prior visual experience. A variant of a Molyneux question could be asked, with the prediction being that it would take a while for a newly sighted individual to show the effect. Note that these results are difficult to reconcile with the idea that the conceptual, abstract sense of space is altered in blind people[27], who show no difficulty in using the words “high” and “low” to describe pitch[28].

Instead of considering the mapping of pitch onto space as paradoxical, a more ecological approach to audition reveals the close link between auditory and visual space for the perceiver immersed in an environment. If this diminishes the appeal of an amodal remapping or abstract one, we still need to explain why the auditory dimension of pitch finds itself mapped onto vertical visual locations. Our proposal here is to reconsider this mapping as an instance of crossmodal correspondence.

Crossmodal correspondences

The term ‘crossmodal correspondences’ refers to the sometimes surprising mappings that exist between different features, attributes, or dimensions of our sensory experience[29]. For example, people tend to associate higher-pitched sounds not only with locations that are higher-up, but also with lighter, brighter objects[30]. While crossmodal correspondences have been often assimilated to the much more specific and idiosyncratic phenomenon of synaesthesia, they appear to be much more implicit and widespread, having now been documented between a wide variety of different stimulus attributes/dimensions across ages and cultures[31].

Congruent with its classification as a crossmodal correspondence, the relation between pitch and elevation, like the majority of other crossmodal correspondences (e.g., between pitch and size), would appear to be a relative rather than an absolute phenomenon: The same tone (say 700Hz) can be mapped onto a lower or higher spatial elevation, depending on whether it is contrasted (in the experiment or in reference to a mental standard) with a 1200Hz or 350Hz tone. This shows that the underlying explanation is not a standard case of conditioned learning, where fixed values of a kind of stimulus are associated to another set of fixed values of another kind of stimulus, but of mapping, or, to say it even better, of correspondence.

Conclusion : Lessons and challenges raised by the crossmodal account

The paradox of spatial pitch, as has been argued here, only emerges because of the tendency to look at audition in isolation from the other modalities. This unisensory perspective is also partly disembodied, forgetting the link between sensory epithelia dictated by their spatial distribution on our bodies, and the way in which they collaborate to guide bodily actions. Envisaged in connection with the location of the resonating bodies in the external space, in which vision is also interested, the spatial biases generated by pitch stop looking paradoxical and start to appear more like crossmodal biases, grounded in the statistical regularities of the environment. The hypothesis that the crossmodal correspondence between pitch and visual elevation partly comes from the internalization of natural scene statistics[32] is indeed robust, and space-pitch is one of the rare case where ‘natural scenes’ have been investigated with multisensory perception in mind (rather than exclusively for vision). I would argue that spatial pitch offers here important lessons for our understanding the kind of learning that comes from natural scene statistics.

First, natural scene statistics are not simply environmental regularities, truly independent of us and our actions, as the link between pitch and visual elevation is centred on our bodies[33].

Secondly, it must be noted that the relative nature of the space-pitch mapping (which is key to its classification as a correspondence, rather than association) is compatible with the idea that it was learned through exposure to natural scene statistics. The natural scene statistics regularity that higher-pitched sounds originate, on average, from higher in space sets up a domain-general distinction which is relative (higher pitcher = higher, or lower pitch = lower), rather than absolute like in the kind of associative learning always discussed. What it shows is the need to avoid equating the internalisation of statistics with local associative learning conducted on specific sets of stimuli. Mapping the differences, and integrating this new form of learning into our accounts of the mind will be an interesting challenge for empiricist models of the mind, and one where the example of space-pitch should play a role. The present account suggests a new framework to understand the role of natural scene statistics in setting up domain-general priors, which can apply in every context, and yet are be easily over-ruled by more specific priors and knowledge (the spatial biases which exist for mere tones played in the laboratory will dissipate if we have some knowledge of the source, or the validity of the sound-visual target association in the specific context). In a way, natural scene statistics set up our default bets, in the absence of better or more specific ones.

Now, if the crossmodal solution to the paradox of spatial pitch defended here sends an encouragement to those interested in statistical (not to say Bayesian) accounts of the mind, will it not be disappointing those who counted on an explanation of this paradox to address questions about musical cognition?

The connection with vision, and ultimately between ears, eyes and external objects, which seem to be grounding the space-pitch relation in actual spatial relations, while the location of external objects is largely indifferent to the perception of musical motion (although it matters to the quality of the experience) and listening to a piece of music as little to do with locating objects vertically.

However, perceiving pitch as high or low, most certainly, can come to be recruited in a ‘pure musical listening’ episode, where localizing sound sources is not meant to be relevant. Failures to recruit the correspondence ‘off-line’ – or at least off purpose – could perhaps illuminate the much discussed hypothesis of a deficient pitch-space mapping in amusic individuals[34]. These studies open important questions regarding the extent to which musical processing involves, rather than simply evoke, visuo-spatial representations.

However, the fact that the paradox of spatial pitch says little about the paradox of musical motion leaves intact what we see as a much more important cognitive and cultural explanations. Other sensory factors might explain the privileged association between pitch and elevation here, including the link between body size and vocal pitch (within a species, and over-and-above differences in the tension of the vocal chords), and the further link between pitch and affective expressions or social communication. While high-pitched sounds are used to attract interest in social, pacific contexts (e.g., ‘motherese’) and are associated with smaller, non-threatening objects, lower-pitched sounds will be associated with danger, or more negative expressions and contexts. The connection between high/low pitch and high/low locations could, then, be explained partly through a more fundamental connection between negative and down and positive and up – which might be more easily qualified as a metaphor. On the other hand, pitch will become painful only at the highest frequencies, making the hedonic associations of pitch less straightforward.

Spatial biases across audition and vision have also (but only) perhaps a role to play in the speculations about the roots of instrument design and musical notation. In musical notation (something which is evolutionary very recent and will have probably drawn on existing perceptual biases, as it is the case for the shapes of writing systems coding sounds), higher pitch are indeed represented higher on the staff. It seems that musical notation has incorporated the association between pitch and elevation even since its earliest forms in ancient times (e.g., the Seikilos epitaph, around 200 BC to around AD 100). Looking at the side of musical production, not all instruments seem to be built to make sure that higher-pitch sounds are produced by manipulating the upper part of the instrument, but many of them do in the Western tradition. In recent decades, musicians, researchers and technology developers had tried to adapt music production to visual digital media, the exploitation of pitch-space biases are becoming more salient.

In other words, the link between pitch and visual space, leaves much of the paradox of musical motion open for discussion, while it contributes to underlying the sensory basis of the development of music in human societies, including the privileged uses of certain visual representations in music.


Antović, M., Bennett, A., & Turner, M. (2013). Running in circles or moving along lines: Conceptualization of musical elements in sighted and blind children. Musicae Scientiae, 17, 229-245.

Ben-Artzi, E., & Marks, L. E. (1995). Visual-auditory interaction in speeded classification: Role of stimulus difference. Perception & Psychophysics, 57, 1151-1162.

Bernstein, I. H., & Edelstein, B. A. (1971). Effects of some variations in auditory input upon visual choice reaction time. Journal of Experimental Psychology, 87, 241-247.

Binetti, N., Hagura, N., Fadipe, C., Tomassini, A., Walsh, V., & Bestmann, S. (2015). Binding space and time through action. Proceedings of the Royal Society of London B: Biological Sciences, 282(1805), 20150381.

Braaten, R. (1993). Synesthetic correspondence between visual location and auditory pitch in infants. Paper presented at the Annual Meeting of the Psychonomic Society.

Brunetti, R., Indraccolo, A., Del Gatto, C., Spence, C., & Santangelo, V. (submitted). Are crossmodal correspondences absolute or relative? Context effects on speeded classification. Attention, Perception, & Psychophysics.

Bueti, D., & Walsh, V. (2009). The parietal cortex and the representation of time, space, number and other magnitudes. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 364, 1831-1840.

Butterworth, B. (2005). The development of arithmetical abilities. Journal of Child Psychology and Psychiatry, 46, 3-18

Carnevale, M. J., & Harris, L. R. (2016). Which direction is up for a high pitch? Multisensory Research, 29, 113-132.

Chang, S., & Cho, Y. S. (2015). Polarity correspondence effect between loudness and lateralized response set. Frontiers in Psychology, 6:683.

Chiou, R., & Rich, A. N. (2012). Cross-modality correspondence between pitch and spatial location modulates attentional orienting. Perception, 41, 339-353.

Chiou, R., & Rich, A. N. (2015). Volitional mechanisms mediate the cuing effect of pitch on attention orienting: The influences of perceptual difficulty and response pressure. Perception, 44, 169-182.

Cho, Y. S., Bae, G. Y., & Proctor, R. W. (2012). Referential coding contributes to the horizontal SMARC effect. Journal of Experimental Psychology: Human Perception and Performance, 38, 726-734.

Crisinel, A.-S., & Spence, C. (2009). Implicit association between basic tastes and pitch. Neuroscience Letters, 464, 39-42.

Crisinel, A.-S., & Spence, C. (2012). A fruity note: Crossmodal associations between odors and musical notes. Chemical Senses, 37, 151-158.

Crollen, V., Dormal, G., Seron, X., Lepore, F., & Collignon, O. (2013). Embodied numbers: The role of vision in the development of number–space interactions. Cortex, 49, 276-283.

Dehaene, S. (1997). The number sense: How the mind creates mathematics. Oxford, UK: Oxford University Press.

Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396.

Deroy, O., Fasiello, I., Hayward, V., & Auvray, M. (2016). Differentiated audio-tactile correspondences in sighted and blind individuals. Journal of Experimental Psychology: Human Perception and Performance.

Deroy, O., & Spence, C. (2013). Weakening the case for ‘weak synaesthesia’: Why crossmodal correspondences are not synaesthetic. Psychonomic Bulletin & Review, 20, 643-664.

Deroy, O., & Spence, C. (2016). Crossmodal correspondences: Four challenges. Multisensory Research, 30, 29-48.

Di Luca, S., & Pesenti, M. (2011). Finger numeral representations: More than just another symbolic code. Frontiers in Psychology, 2:272.

Dolscheid, S., & Casasanto, D. (2015). Spatial congruity effects reveal metaphorical thinking, not polarity correspondence. Frontiers in Psychology, 6:1836.

Dolscheid, S., Hunnius, S., Casasanto, D., & Majid, A. (2014). Prelinguistic infants are sensitive to space-pitch associations found across cultures. Psychological Science, 25, 1256-1261.

Doppler, C. (1842). Ueber das farbige Licht der Doppelsterne und einiger anderer Gestirne des Himmels: Versuch einer das Bradley’sche Aberrations-Theorem als integrirenden Theil in sich schliessenden allgemeineren Theorie. Prague: K. Bohnm, Association of Science.

Douglas, K. M., & Bilkey, D. K. (2007). Amusia is associated with deficits in spatial processing. Nature Neuroscience, 10, 915-921.

Eitan, Z., & Granot, R. Y. (2006). How music moves: Musical parameters and listeners’ images of motion. Music Perception, 23, 221-247.

Eitan, Z., & Granot, R. Y. (2007). Intensity changes and perceived similarity: Inter-parametric analogies. Musicae Scientiae, Discussion Forum 4a, 99-133.

Eitan, Z., Ornoy, E., & Granot, R. Y. (2012). Listening in the dark: Congenital and early blindness and cross-domain mapping in music. Psychomusicology: Music, Mind, & Brain, 22, 33-45.

Eitan, Z., Schupak, A., Gotler, A., & Marks, L. E. (2014). Lower pitch is larger, yet falling pitches shrink. Experimental Psychology, 61, 273-284.

Eitan, Z., & Timmers, R. (2010). Beethoven’s last piano sonata and those who follow crocodiles: Cross-domain mappings of auditory pitch in a musical context. Cognition, 114, 405-422.

Evans, K. K., & Treisman, A. (2010). Natural cross-modal mappings between visual and auditory features. Journal of Vision, 10(1):6, 1-12.

Fairhurst, M., & Deroy, O. (submitted). Magnitude-space mapping for auditory and audio-visual intensity: A test of the shared spatial representation of magnitude.

Fernández-Prieto, I. & Navarra, J. (submitted). The higher the pitch the larger its crossmodal influence on visuospatial processing. Psychology of Music.

Fernández-Prieto, I., Navarra, J., & Pons, F. (2015). How big is this sound? Crossmodal association between pitch and size in infants. Infant Behavior and Development, 38, 77-81.

Foster, N. E. V., & Zatorre. R. J. (2010). Cortical structure predicts success in performing musical transformation judgments. NeuroImage, 53, 26-36.

Gallace, A., & Spence, C. (2006). Multisensory synesthetic interactions in the speeded classification of visual size. Perception & Psychophysics, 68, 1191-1203.

Grau, J. W., & Nelson, D. K. (1988). The distinction between integral and separable dimensions: Evidence for the integrality of pitch and loudness. Journal of Experimental Psychology: General, 117, 347-370.

Gulick, W. L. (1971). Hearing: Physiology and psychophysics. New York, NY: Oxford University Press

Held, R., Ostrovsky, Y., de Gelder, B., Gandhi, T., Ganesh, S., Mathur, U., & Sinha, P. (2011). The newly sighted fail to match seen with felt. Nature Neuroscience, 14, 551-553.

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33, 61-135.

Hubbard, T. L. (2013). Auditory imagery contains more than audition. In S. Lacey & R. Lawson (Eds.), Multisensory imagery: Theories and applications (pp. 221-247). New York, NY: Springer.

Hubbard, E. M., Piazza, M., Pinel, P., & Dehaene, S. (2005). Interactions between number and space in parietal cortex. Nature Reviews Neuroscience, 6, 435-448.

Hubbard, T. L., & Ruppel, S. E. (2013). A Fröhlich effect and representational gravity in memory for auditory pitch. Journal of Experimental Psychology: Human Perception and Performance, 39, 1153-1164.

Huron, D., & Shanahan, D. (2013). Eyebrow movements and vocal pitch height: Evidence consistent with an ethological signal. Journal of the Acoustical Society of America, 133, 2947-2952.

Jeschonek, S., Pauen, S., & Babocsai, L. (2013). Cross-modal mapping of visual and acoustic displays in infants: The effect of dynamic and static components. European Journal of Developmental Psychology, 10, 337-358.

Johnson, M., & Lakoff, G. (1980). Metaphors we live by. Chicago, IL: University of Chicago Press.

Karwoski, T. F., Odbert, H. S., & Osgood, C. E. (1942). Studies in synesthetic thinking: II. The role of form in visual responses to music. The Journal of General Psychology, 26, 199-222.

Kay, P., & Kempton, W. (1984). What is the Sapir-Whorf hypothesis? American Anthropologist, 86, 65-79.

Krumhansl, C. L. (1979). The psychological representation of musical pitch in a tonal context. Cognitive Psychology, 11, 346-374.

Krumhansl, C. L. (2001). Cognitive foundations of musical pitch. Oxford, UK: Oxford University Press.

Larson, S. (2012). Musical forces: Motion, metaphor, and meaning in music. Bloomington, IN: Indiana University Press.

Lewkowicz, D. J., & Minar, N. J. (2014). Infants are not sensitive to synesthetic cross-modality correspondences: A comment on Walker et al. (2010). Psychological Science, 25, 832-834.

Ludwig, V. U., Adachi, I., & Matzuzawa, T. (2011). Visuoauditory mappings between high luminance and high pitch are shared by chimpanzees (Pan troglodytes) and humans. Proceedings of the National Academy of Sciences of the USA, 108, 20661-20665.

Mach, E. (1959). The analysis of sensations. New York, NY: Dover.

Maeda, F., Kanai, R., & Shimojo, S. (2004). Changing pitch induced visual motion illusion. Current Biology, 14, R990-R991.

Marks, L. E. (2004). Cross-modal interactions in speeded classification. In G. A. Calvert, C. Spence, & B. E. Stein (Eds.), Handbook of multisensory processes (pp. 85-105). Cambridge, MA: MIT Press.

Meier, B. P., & Robinson, M. D. (2004). Why the sunny side is up. Associations between affect and vertical position. Psychological Science, 15, 243-247.

Melara, R. D., & Marks, L. E. (1990). Interaction among auditory dimensions: Timbre, pitch, and loudness. Perception & Psychophysics, 48, 169-178.

Melara, R. D., & O’Brien, T. P. (1987). Interaction between synesthetically corresponding dimensions. Journal of Experimental Psychology: General, 116, 323-336.

Miller, J. O. (1991). Channel interaction and the redundant targets effect in bimodal divided attention. Journal of Experimental Psychology: Human Perception and Performance, 17, 160-169.

Mossbridge, J. A., Grabowecky, M., & Suzuki, S. (2011). Changes in auditory frequency guide visual-spatial attention. Cognition, 121, 133-139.

Mudd, S. A. (1963). Spatial stereotypes of four dimensions of pure tone. Journal of Experimental Psychology, 66, 347-352.

Nava, E., Grassi, M., & Turati, C. (2016). Audio-visual, visuo-tactile and audio-tactile correspondences in preschoolers. Multisensory Research, 29, 93-111.

Neuhoff, J. G., & McBeath, M. K. (1996). The Doppler illusion: The influence of dynamic intensity change on perceived pitch. Journal of Experimental Psychology: Human Perception and Performance, 22, 970-985.

Nieder, A., & Dehaene, S. (2009). Representation of number in the brain. Annual Review of Neuroscience, 32, 185-208.

Noble, C., Mossbridge, J., Iordanescu, L., Sherman, A., List, A., Grabowecky, M., et al. (2010). Motion induced pitch: A case of visual-auditory synesthesia. Journal of Vision, 10, 872.

Nussbaum, C. O. (2007). The musical representation: Meaning, ontology, and emotion. Cambridge, MA: MIT Press.

Occelli, V., Spence, C., & Zampini, M. (2009). Compatibility effects between sound frequencies and tactually stimulated locations on the hand. Neuroreport, 20, 793-797.

Parise, C. V. (2016). Crossmodal correspondences: Standing issues and experimental guidelines. Multisensory Research, 29, 7-28.

Parise, C. V., Knorre, K., & Ernst, M. O. (2014). Natural auditory scene statistics shapes human spatial hearing. Proceedings of the National Academy of Sciences of the USA, 111, 6104-6108.

Parise, C., & Spence, C. (2008). Synaesthetic congruency modulates the temporal ventriloquism effect. Neuroscience Letters, 442, 257-261.

Parkinson, C., Kohler, P. J., Sievers, B., & Wheatley, T. (2012). Associations between auditory pitch and visual elevation do not depend on language: Evidence from a remote population. Perception, 41, 854-861.

Pasqualotto, A., Taya, S., & Proulx, M. J. (2014). Sensory deprivation: Visual experience alters the mental number line. Behavioural Brain Research, 261, 110-113.

Patching, G. R., & Quinlan, P. T. (2002). Garner and congruence effects in the speeded classification of bimodal signals. Journal of Experimental Psychology: Human Perception & Performance, 28, 755-775.

Peacocke, C. (2009). The perception of music: Sources of significance. The Modern Schoolman, 86, 239-260.

Pedley, P. E., & Harper, R. S. (1959). Pitch and the vertical localization of sound. The American Journal of Psychology, 72, 447-449.

Pratt, C. C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology, 13, 278-285.

Proctor, R. W., & Cho, Y. S. (2006). Polarity correspondence: A general principle for performance of speeded binary classification tasks. Psychological Bulletin, 132, 416-442.

Proctor, R. W., & Reeve, T. G. (Eds.). (1989). Stimulus-response compatibility: An integrated perspective. Elsevier.

Roffler, S. K., & Butler, R. A. (1968). Factors that influence the localization of sound in the vertical plane. Journal of the Acoustical Society of America, 43, 1255-1259.

Rusconi, E., Kwan, B., Giordano, B. L., Umiltà, C., & Butterworth, B. (2006). Spatial representation of pitch height: The SMARC effect. Cognition, 99, 113-129.

Sadaghiani, S., Maier, J. X., & Noppeney, U. (2009). Natural, metaphoric, and linguistic auditory direction signals have distinct influences on visual motion processing. Journal of Neuroscience, 29, 6490-6499.

Salgado-Montejo, A., Marmolejo-Ramos, F., Alvarado, J. A. Arboleda, J. C., Suarez, D. R., & Spence, C. (in press). Drawing sounds: Representing tones and chords spatially. Experimental Brain Research.

Scruton, R. (1983). The aesthetic understanding. South Bend, IN: St Augustine’s Press.

Shaki, S., Petrusic, W. M., & Leth-Steensen, C. (2012). SNARC effects with numerical and non-numerical symbolic comparative judgments: Instructional and cultural dependencies. Journal of Experimental Psychology: Human Perception and Performance, 38, 515-530.

Shaw, E. A. G. (1982). External ear response and sound localization. In R. Gatehouse (Ed.), Localization of sound. Theory and applications (pp. 30-41), Groton, CT: Amphora.

Shayan, S., Ozturk, O., & Sicoli, M. A. (2011). The thickness of pitch: Crossmodal metaphors in Farsi, Turkish, and Zapotec. The Senses and Society, 6, 96-105.

Shepard, R. N. (1982). Geometrical approximations to the structure of musical pitch. Psychological Review, 89, 305-333.

Spence, C., & Deroy, O. (2013). How automatic are crossmodal correspondences? Consciousness and Cognition, 22, 245-260.

Stevens, S. S. (1935). The relation of pitch to intensity. The Journal of the Acoustical Society of America, 6, 150-154.

Stumpf, C. (1883). Tonpsychologie I [Psychology of the tone]. Leipzig: Hirzel.

Stumpf, C. (1911/2012). The origins of music (Ed. and Trans., David Trippett). Oxford, UK: Oxford University Press

Tajadura-Jimenez, A., Vakali, M., Fairhurst, M. F., Mandringin, A., Bianchi-Berthouze, N., & Deroy, O. (submitted). Auditory Pinocchio: Rising pitch changes the mental representation of one’s finger length.

Terhardt, E. (1974). On the perception of periodic sound fluctuations (roughness). Acustica, 30, 201-213.

Tillman, B., & Bharucha, J. J. (1999). Perceiving and learning harmonic structure: Some news from MUSACT. International Journal of Computing Anticipatory Systems, 4, 289-300.

Vera-Constán, F. Rodríguez-Cuadrado, S., Romero-Rivas, C., Puigcerver, Fernández-Prieto, I., & Navarra, J. (submitted). Seeing music: The perception of melodic ‘ups and downs’ modulates the spatial processing of visual stimuli.

Verschuure, J., & Van Meeteren, A. A. (1975). The effect of intensity on pitch. Acta Acustica, 32, 33-44.

Walker, P. (2012). Cross-sensory correspondences and cross talk between dimensions of connotative meaning: Visual angularity is hard, high-pitched, and bright. Attention, Perception, & Psychophysics, 74, 1792-1809.

Walker, P., Bremner, J. G., Mason, U., Spring, J., Mattock, K., Slater, A., & Johnson, S. P. (2014). Preverbal infants are sensitive to cross-sensory correspondences: Much ado about the null results of Lewkowicz and Minar (2014). Psychological Science, 25, 835-836.

Walker, P., & Smith, S. (1986). The basis of Stroop interference involving the multimodal correlates of auditory pitch. Perception, 15, 491-496.

Walker, R. (1987). The effects of culture, environment, age, and musical training on choices of visual metaphors for sound. Perception & Psychophysics, 42, 491-502.

Walsh, V. (2003). A theory of magnitude: Common cortical metrices of time, space and quality. Trends in Cognitive Sciences, 7, 483-488.

Williamson, V. J., Cocchini, G., & Stewart, L. (2011). The relationship between pitch and space in congenital amusia. Brain and Cognition, 76, 70-76.

Yau, J. M., Olenczak, J. B., Dammann, J. F., & Bensmaia, S. J. (2009). Temporal frequency channels are linked across audition and touch. Current Biology, 19, 561-566.

Zuckerkandl, V. (1956). Sound and symbol (Trans. Willard R. Trask). New York: Pantheon Books.


[1] See Huron & Shanahan, 2013.

[2] Eitan & Granot, 2006

[3] E.g. Larson, 2012; Nussbaum, 2007; Peacocke, 2009; Scruton, 1983.

[4] Stumpf, 1883, 1911.

[5] And to a lesser extent, into left and right (Mudd, 1963). Here I will focus on the dominant vertical mapping, which is both more common and more robust. The horizontal mapping is also highly dependent on musical training – and discussed in more details in Deroy et al., in press.

[6] E.g., Jeschonek, Pauen, & Babocsai, 2013.

[7] E.g., Johnson & Lakoff, 1980; Parise, Knorre, & Ernst, 2014; Rusconi Kwan, Giordano, Umiltà, & Butterworth, 2006

[8] Pitch itself is better accounted for in a multidimensional manner (e.g., Krumhansl, 2001). One might think here of Shepard’s (1982) helix, Krumhansl’s cone (Krumhansl, 1979), or Bharucha’s MUSACT (e.g., Tillman & Bharucha, 1999).

[9]  Walsh, 2003

[10] E.g., Pedley & Harper, 1959; Pratt, 1933; Roffler & Butler, 1968.

[11] Though it is worth noting that a high frequency sound will likely spread more easily when coming from a source located high in space, just because the sound will be absorbed by fewer obstacles (than is the case on the ground). This is not the case for low frequency sounds which we are generally worse at detecting the spatial location and which are also less likely to be absorbed by obstacles.

[12] I.e,  in speeded classifications tasks,  E.g., Ben-Artzi & Marks, 1995; Bernstein & Edelstein, 1971; Evans & Treisman, 2010; Melara & O’Brien, 1987; Patchling & Quinlan, 2002. See also Miller, 1991, for a Redundant Target effect.

[13] I.e. crossmodal cueing, Chiou & Rich, 2012 ; Fernández-Prieto and Navarra (submitted).

[14] E.g. Carnivale & Harris, 2016; Maeda et al., 2004; Sadaghiani et al., 2009.

[15] E.g. Rusconi et al., 2006. For instance, the faster responses observed for responses to high and low pitch when they are associated with upper and lower response buttons, respectively, rather than the reverse, is taken as evidence of a vertical mapping of pitch differences, with higher pitch corresponding to the upper position, and lower pitch to the lower one.

[16] Dehaene, Bossini, & Giraux, 1993

[17] Though Spanish, Catalan, and French more commonly use the terms ‘obtuse’ and ‘acute’, and Farsi, Turkish and Zapotec more often use ‘thick’ and ‘thin’, see Shayan, Oztur, & Sicoli, 2011.

[18] E.g., Dolscheid & Cassasanto, 2015; Meier & Robinson, 2004.

[19] E.g., Hubbard, Piazza, Pinel, & Dehaene, 2005; see Nieder & Dehaene, 2009, for a review.

[20] E.g. Dehaene, 1997.

[21] E.g. Di Luca & Pesenti, 2012; see also Butterworth, 2005.

[22] (see also Binetti, Hagura, Fadipe, Tomassini, Walsh, & Bestmann, 2015; Buetti & Walsh, 2009)

[23] Karwoski, Odbert, & Osgood, 1942, for an early statement; Walker, 2012; Walker & Smith, 1986.

[24] Chang & Cho, 2015; Fairhurst & Deroy, submitted; Mudd, 1963.

[25] E.g., Grau & Nelson, 1988; Melaka & Marks, 1990; Stevens, 1935; Verschuure & van Meeteren, 1975. However, the effect is minimal: Stevens demonstrated that two tones that differ in intensity can differ in frequency by, at most, 3% (half a semi-tone) and still be perceived as equal in pitch, but subsequent studies have found a much smaller effect (e.g., Gulick, 1971).

[26] Deroy et al., 2016.

[27] Though in this case, it is in line with some other studies that have shown that the spatial mappings of symbolic magnitudes are altered in congenitally blind individuals .E.g., Crollen, Dormal, Seron, Lepore, & Collignon, 2013; Pasqualotto, Taya, & Proulx, 2014.

[28] Antović, Bennett, & Turner, 2013; Eitan, Ornoy & Granot, 2012.

[29] See Deroy & Spence, 2016; Spence, 2011, for reviews.

[30] E.g., Parise, 2016; Walker, 2012.

[31] Deroy & Spence, 2013.

[32] Parise et al., 2014. The naturalistic recordings in this study made via directional microphones mounted on a baseball cap designed to capture the sounds of the environment and the elevation from which they come.

[33] See Mossbridge et al., 2011; Parise et al., 2014.

[34] E.g. Douglas & Bilkey, 2007; Williamson, Cocchini, & Stewart, 2011.

4 thoughts on “KEYNOTE: How to solve the paradox of spatial pitch without resorting to metaphors”

  1. Thank you for the opportunity to read this very interesting paper on the relationship between pitch and space. A background in sound design once prompted me to return to school and study philosophy to pursue a number of puzzling questions about music. That was many years ago and it was disappointing to discover that philosophy had no interest at that time in Pratt and Strumpf. I’m very grateful (and perhaps a bit envious) to know that interesting subjects like this are being investigated by scientifically informed philosophers like yourself.

    I am hesitant to offer my comment on the topic of this paper, as I am unfamiliar with the majority of readings, but as I read through this paper, one seemingly straightforward explanation for our use of the high/low characterization of pitch came to mind, which was not discussed. I am referring to the pyramidal arrangement that groups of differently-pitched physical objects of the same material naturally “resolve” to when stacked for maximum stability under the influence of gravity. A google image search for “stacked stones” clearly illustrates this pattern, and a similar effect can be seen more or less throughout nature wherever homotimbral bodies of various pitch are arranged in stable groupings, and a similar effect is evident in the progressive widening growth of stable plants and trees at the base, and through the upward direction of the growth of incrementally smaller plant stems under the effect of negative gravitropism. More generally, this effect also seems to be supported by the conic sections described by trajectories under classical mechanics, where what we call “down” is the direction of the centre of gravity, and “up” is the opposite direction.

    1. Dear John,

      Thanks for the comment. The hypotheses you suggest might well be true – they are just very difficult to tell apart. We are constantly exposed to these kinds of regularities, and wondering which one is the determinant one might not make sense. The thing is, there is no reason to suppose that these matchings have been learned from specific kinds of experiences, like the ‘sacked stones’ one. In other words, they are not domain-specific, but domain-general. Interesting evidence here comes from German researchers (Parise et al., 2013) who have shown that the source of high-pitch sounds tend to be above one’s head (spatially elevated). So we internalize this generality, and use it when we have no better prior to work with. There are very interesting lessons here for sound design – and i am glad you saw the relevance !

      Best regards

  2. Dear Ophelia,

    Thanks so much for this very interesting paper. I have two questions (since you already discussed the Molyneux one).

    One is whether there might be a “natural” or ecological reason to associate pitch with elevation in that many of the highest-pitched natural sounds come from birds, which are usual above us (especially when they are singing), while land-bound animals tend to make lower-pitched sounds. Do you think there is anything to this possibility?

    The other question is whether work has been done to see if there is a corresponding connection between loudness and perceived bigness. I would expect so — expect, that is, that louder objects would be misjudged as larger than they really are (and quieter ones as smaller), that there would be Stroop-like interference between the loudness of a sound and the ability to identify a large vs. a small target, and so on. Have these things been explored? And if so, do you think they are connected to your concerns here?


    1. Dear John,

      Thanks for the comments and questions.

      Your first one points at an important issue – that is, if we find that people match two apparently unrelated sensory features (like here pitch and elevation), the most natural explanation must be that they have somewhat learned it from exposure to natural regularities. So it’s not that the two features are REALLY unrelated, they only SEEM unrelated when we consider them. This is not totally unconnected from Molyneux, as you rightly mention, but raises a different problem: In Molyneux, the two features (felt shape, see shape) SEEM to us to be (naturally, necessarily, etc.) related and they are REALLY related (for most metaphysics).

      When asked to justify our matchings, we tend to look for specific hypotheses, like the ones you mention for birds. But there is no reason to suppose that these matchings have been learned from specific kinds of objects, like animals. In other words, they are not domain-specific, but domain-general. German researchers (Parise et al., 2013) have shown that the source of high-pitch sounds tend to be above one’s head (spatially elevated) – coming from birds or others.

      The conclusion i present is that these two things are linked : The fact that pitch and elevation SEEM unrelated and the fact that they are correlated only at the most general scale. In other words, we are good at justifying associations when they are object-specific, but not when they are domain-general.

      Regarding the second point, there is some really fascinating work exploring all sorts of correspondences – including the one you suggest between size and loudness – using interference in speeded classification tasks (or other compatibility effects). Marks (1978) The unity of the Senses, presents the seminal work in the area (done by him, and Stevens – beautiful psychophysics). The matching across magnitudes (size-loudness) however is different from other correspondences, which involve more qualitative features (pitch is not a magnitude per se, and relates to frequency in a complex way, as I am sure you know).

      You can find review / discussion of more recent work in Spence, C. (2011). Crossmodal correspondences: A tutorial review. Attention, Perception, & Psychophysics, 73(4), 971-995 and in Deroy, O., & Spence, C. (2016). Crossmodal correspondences: four challenges. Multisensory Research, 29(1-3), 29-48.

      Happy to discuss more,
      Best, Ophelia

Comments are closed.