speech transcription (TEI)

The module described in the TEI Guidelines, Chapter 8, Transcriptions of Speech, "is intended for use with a wide variety of transcribed spoken material. It should be stressed, however, that the present proposals are not intended to support unmodified every variety of research undertaken upon spoken material now or in the future; some discourse analysts, some phonologists, and doubtless others may wish to extend the scheme presented here to express more precisely the set of distinctions they wish to draw in their transcriptions. Speech regarded as a purely acoustic phenomenon may well require different methods from those outlined here, as may speech regarded solely as a process of social interaction.

"This chapter begins with a discussion of some of the problems commonly encountered in transcribing spoken language (section 8.1 General Considerations and Overview). Section 8.2 Documenting the Source of Transcribed Speech documents some additional TEI header elements which may be used to document the recording or other source from which transcribed text is taken. Section 8.3 Elements Unique to Spoken Texts describes the basic structural elements provided by this module. Finally, section 8.4 Elements Defined Elsewhere of this chapter reviews further problems specific to the encoding of spoken language, demonstrating how mechanisms and elements discussed elsewhere in these Guidelines may be applied to them."

8.1 General Considerations and Overview
"A spoken text may contain any of the following components:
 * utterances
 * pauses
 * vocalized but non-lexical phenomena such as coughs
 * kinesic (non-verbal, non-lexical) phenomena such as gestures
 * entirely non-linguistic incidents occurring during and possibly influencing the course of speech
 * writing, regarded as a special class of incident in that it can be transcribed, for example captions or overheads displayed during a lecture
 * shifts or changes in vocal quality

"Elements to represent all of these features of spoken language are discussed in section 8.3 Elements Unique to Spoken Texts below.

"An utterance (tagged u) may contain lexical items interspersed with pauses and non-lexical vocal sounds; during an utterance, non-linguistic incidents may occur and written materials may be presented. The u element can thus contain any of the other elements listed, interspersed with a transcription of the lexical items of the utterance; the other elements may all appear between utterances or next to each other, but except for writing they do not contain any other elements nor any data."