Wednesday, April 25, 2012

Phonology representations less useful than attractors inferred without them.

http://arxiv.org/abs/1204.3236
Phonology typically describes speech in terms of discrete signs like features. The field of intonational phonology uses discrete accents to describe intonation and prosody. But, are such representations useful? The results of mimicry experiments indicate that discrete signs are not a useful representation of the shape of intonation contours. Human behaviour seems to be better represented by a attractors where memory retains substantial fine detail about an utterance. There is no evidence that discrete abstract representations that might be formed that have an effect on the speech that is subsequently produced. This paper also discusses conditions under which a discrete phonology can arise from an attractor model and why - for intonation - attractors can be inferred without the implying a discrete phonology.

We sometimes think of the units of intonational phonology as discrete
entities: accents which fall into just a few categories (Gussenhoven 1999;
Ladd 1996; Beckman and Ayers Elam 1997). In this view, accents in
intonation are equivalent to to phonemes in segmental phonology (except
that they cover a larger interval). They have a rough correspondence to the
acoustic properties of the relevant region and accents form a small set of
atomic objects that do not have meaning individually but that can be
combined to form larger objects that carry meaning. For segmental
phonology, the larger objects are words; for intonation, the larger objects are tunes over a phrase.
However, the analogy is not strong, and there are many differences. For
instance, there is no known useful mapping from intonation phonology to
meaning. (Pierrehumbert & Hirschberg 1990 point out some of the
difficulties.) For words, this is accomplished by dictionaries and internet
search engines. These technologies have no intonational equivalents. To
date, attempts to connect between intonation and fundamental frequency
contours have not escaped from academia: the results to date are either
probabilistic (Grabe, Kochanski & Coleman 2005), have been theoretical and
primarily based on intuition, or have been conducted in tightly controlled
laboratory conditions (Ladd & Morton 1997; Gussenhoven & Rietveld 1997).
Likewise, there is no known, reliable mapping between sound and
intonational phonology. Probability distributions overlap (Grabe, Kochanski
& Coleman 2007) and automated systems for recognizing intonation have
not become commercially useful.
In contrast, the connection between acoustics and segmental phonology is made by speech synthesis and recognition systems. The mapping between sound and segmental phonology is complicated, but it is reasonably well understood, and reliable enough to be commercially useful. As a further contrast, transcription of intonation seems qualitatively different from transcription of segmental information.
Intonational transcription (e.g. Grice et al 1996; Jun et al 2000; Yoon et al
2004) is far more error-prone and slower than transcription of words, even
after extensive training. Yoon et al 2004 found an agreement of circa 85%
between transcribers (depending on exactly what was being compared), but
it is notable that at each point in the transcription, the transcribers had a
choice between (typically) just two symbols. In a typical phonemic or
orthographic transcription, the transcriber would attain comparable or
higher precision while choosing between (about) 40 phones, or amongst
thousands of possible words for each symbol.
So, in light of these differences, it is reasonable to ask whether intonation
can be usefully described by a conventional discrete phonology or not. If it
can be, what are the properties of the objects upon which the phonological
rules operate? This paper lays out empirically-based answers to those
questions and describes an experimental technique that can provide a
reasonably direct exploration of the properties of phonological objects.