The title of this project, “14 phonemes per second,” refers to the speed at which adults speak. The number was first put forward by Lenneberg (1967), and has come to stand for the marvelous and intriguing complexity of speech production. The question that motivates the proposed project is: how do we manage the stunning feat of skilled action that is speaking? The consensus is that it requires some kind of hierarchical control structure. Existing theories of production tend to borrow the representational aspects of this structure from theories that linguists have developed to describe language patterns efficiently. This move introduces a non-trivial translation problem. How exactly does one get from highly abstract, highly structured linguistic representation to fluid control over speech movement? The best psycholinguistic models tackle this problem, but the effort is post hoc. We also cannot rely solely on what is known about speech motor control since the research in this area has focused on sound production and, very often, on the production of a single segment. This focus misses the intentional aspect of speaking; that is, the fact that speaking is about communicating information to a listener. A new approach is needed, and it must be interdisciplinary.
The proposed project is to provide a book-length account of speaking that ties the representations guiding speech movement to speech motor skills, planning routines to cognitive processes, and speaking to language. The hypothesis is that the representations and routines that govern speaking emerge with communicative goal achievement over developmental time, where goal achievement is constrained by changing speech motor and cognitive skills. The foundation for the proposed book is an informal model of speech-language production described in a manuscript written for a special issue of the “Journal of Phonetics” on the cognitive nature of speech sound systems. The stated goals of the model are (i) to provide continuity in representations across different levels of analysis—that is, from speech to language, and (ii) to develop a framework for understanding the details of speaking that is consistent with usage-based approaches to grammar and language acquisition. The model assumes that fluent speech production is guided by schemas, which are temporally structured sequences of remembered action and their sensory outcomes. The hypothesis is that these are formed when a speaker has successfully achieved a communicative goal. In early acquisition, schemas are coextensive with proto-words and the accompanying intonational patterns. Later they are words, collocations, and, under some circumstances, whole constructions (e.g., idioms). Thus, schemas represent action patterns associated with any unit of linguistic meaning. The meaning is the goal, and it is separately represented.
Once acquired, schemas provide abstract programs for production. These are activated or inhibited via the linked goals. Control via communicative goals captures the intentional and automatic aspects of speech-language production. Goals are subject to conscious inspection; schemas are not. Control via communicative goals means that schemas are activated and executed as holistic units. Goals and linked schemas are defined by the language acquisition process. This process allows for the emergence of hierarchically structured speech plan representations.
The informal model already briefly sketched raises many questions for further exploration. These include, but are not limited to, questions concerning the nature of the patterns remembered, how holistic are they initially, and to what extent are they differentiated over developmental time; questions about the recombination of schemas, and how free or constrained the process might be; questions about how the intonational and segmental aspects of speech are aligned during speaking; and questions about the nature of constraints on speech planning. These questions, and many others, will be addressed in the proposed book. The work will draw from a variety of literatures, including the literature on non-language motor skill acquisition and control, speech motor skill development, memory and planning in language production, the development of relevant cognitive processes, language acquisition, and linguistic theory. The goal is to arrange the findings from disparate fields into a coherent account of speaking that is psychologically plausible, developmentally sensitive, and computationally implementable.