Nick Campbell

ATR
Japan

Expressive speech and speech synthesis

Abstract

Much of current speech technology is designed for use with so-called "broadcast-mode" speech, in which the interface is not required to be interactive. However, most social human interaction employs a more two-way "conversational-mode" of speaking. In broadcast mode, the speaker is not aware of the presence or attentional states of the listener, and carries on speaking regardless. In conversational mode, on the other hand, the speaker and listener interact in a delicate interplay of often overlapping segments to jointly contribute to the mutual generation of conversational content and meaning.

Analysis of speech turns taken from a large corpus of very natural conversations, confirming that as much as 50% of each partner's speech is overlapping at some point. To explain this, we show how the discourse contents can be separated into "content-giving" and "collaborational" elements, illustrating how the listener actively contributes to the discourse.

This series of talks presents some novel technology for processing such interactive speech, and shows how future speech technology might be able to sense the presence and attentional states of a human listener in order to be able to adapt its speaking style and content according to the comprehension and interests of the human partner. In future speech synthesis, more use will be made of nonverbal speech sounds such as laughs, grunts, backchannel utterances, and so forth. Correspondingly, for an interactive speech interface, such as part of a dialogue system, useful inferences must be made about the attentional states of those present so that changes can be made as necessary to the output speech style and content.

A further point made in the talks is that the tempo of a discourse changes throughout the conversation as the partners come mentally closer or drift further apart, as they become more or less excited, and as they find elements of common interest in the content. Whereas studies of the speech characteristics in this type of interaction have typically been tackled under the umbrella of "emotion" in previous work, we prefer to approach the topic within the framework of "nonverbal speech" and to incorporate it more intimately into the spoken dialogue component.

keywords:

Lecture notes


About the Lecturer

URL: http://feast.atr.jp/nick/

Last changed December 16, 2008 14:51 EET by local organizers, vispp2008(at)phon.ioc.ee