Björn Granström and colleagues

TMH - Department of Speech, Music and Hearing
KTH - Royal Institute of Technology

Multimodal feedback in dialogue


1. General issues of Multimodal interaction

Lecture: General issues of Multimodal interaction

Introduction to multimodal interaction phenomena in spoken communication. We will also discuss issues related to data collection and labelling of multimodal conversational dialogue data. We go through special requirements when collecting data that is to be used as the basis for data-driven methods and models of human conversation and give an overview of some methods used for such data collection.

Hands-on: Recording

A main speaker reads a section of a book or watches part of a movie. She then retells its contents to a listener who is rigged with facial motion capture equipment. The listener must pay close attention to the recapitulation, and the speaker must make certain that everything is understood, as there is a quiz later on. This will make it necessary for them to interact. The data collected will be used to create an audiovisual model of an attentive listener.

2. Analysis

Lecture: Analysis of conversational behaviour

Introduction to methods for analysis of human communicative behaviour. We present an overview of conversational phenomena in human-human dialogue, and discuss methods for analysing them in order to get results that can be used to model these phenomena in a computer.

Hands-on: Analysis

The sound files from both the speaker and the listener in the data collection are read into WaveSurfer ( together with data tracks representing the listener’s head movements. A special version of SynFace ( is then used to replay the dialogue with two animated talking heads, one of which (the listener) also replays the original head movements. Next, the task is to extract good multimodal feedback utterances from the listener track and to add these to a library of feedback utterances. Finally, a few of the words that provoked feedback are manually extracted from the narrator’s speech to be added to a hot word speech recogniser that will be used during the demonstration on the final day.

3. Synthesis

Lecture: Generation of conversational behaviour

Introduction to human-like and conversational multimodal speech synthesis. We give an overview of the specific requirements on generation and synthesis when the goal is to mimic human behaviour as closely as possible.

Hands-on: Control, generation and synthesis

Experiments with generating different kinds of interactional gestures based on the analyses. A spoken dialogue system mimicking an attentive listener is built using the collected feedback utterances, the hot words, and an end of utterance detector. The artificial listener is evaluated by having the original speaker recapitulating the story for it to get an idea of how well it performs.

Lecture notes

About the Lecturers

Björn Granström
Jonas Beskow
Joakim Gustafson
Jens Edlund

Last changed December 16, 2008 14:52 EET by local organizers, vispp2008(at)