emotion recognition in multimedia content for empathic TV

Explore new ideas allowing to enrich user experiences with Smart TV


Evolution of TV ongoing for many years

Smart TV and set-top boxes allowing new TV experiences

But lack in emotions

We need to better understand the user and the content itself

We believe that emotions understanding is a key


Evolution of TV has been ongoing for many years, and we see now the emergence of next generation TV experiences, through Smart TV or set-top boxes. Advanced features such as voice or gesture based control, integrated recommender systems, and social applications are proposed. If such systems are exciting in a technological point of view, Smart TVs currently lack one important axis of the TV user experience: emotions. To overcome this problem, we think that it is crucial to better understand not only the user watching the TV, but also the content itself. Indeed, bridging the gap between multimedia content and users would allow the emergence of innovative features, and also improve recommender systems.


Movies/TV shows and impaired people?

  • SDH - Subtitles for the deaf
  • Audio description

Too much static and clearly lack in emotions

Not a complete experience, and particularly for people not able to hear

Help impaired people to better feel movies emotions

Emotion extraction (intensity and type)

Emotion transmition with smart objects

  • Dynamic lights
  • Smart watches (vibrations and emoticons)
  • Animated paintings

Nowadays, impaired people can enjoy movies or TV shows thanks to two main systems:

  • SDH - Subtitles for the deaf or hard-of-hearing: important non-dialog information as well as speaker identification are added to regular subtitles, in order to textually describe the scene to deaf or hearing impaired people
  • Audio description: additional audio narration, intended to describe what is happening on the screen to blind and visually impaired consumers of visual media

Unfortunately, these descriptions are too much static, and they currently lack in emotions. Thus, the experience of enjoying a movie is not complete for impaired people. This is particularly true for people not able to hear, as most of the emotion contained in a movie is conveyed through the sound channel. Some works try to improve existing systems, e.g., [Hong et al.] proposed a dynamic captioning scheme, [Ohene-Djan et al.] used some caption manipulations (such as increasing of text sizes and change of color) to create emotional subtitles. Audio description is also expensive, thus not generalized, but it could be profitable and more adopted by production companies in the future [Sade et al.]

Our empatic TV helps impaired people, to better feel movies and TV shows emotions, by extracting and transmitting them. Both the arousal (intensity of the emotion) and the valence (type of the emotion) are analysed and used. With the help of smart objects, our system is then able to communicate these emotions to the user in a multimodal maneer:

  • Dynamic lights: Thanks to Philips Hue lights, the emotions are encoded into colors, and the brightness is changing according to the intensity of the emotion.
  • Smart watches: Emoticons associated to the type of the emotion are displayed. Also, the smartwatch vibrate according to the intensity, and some emotions are also coded into different vibration patterns.
  • Animated paintings: We could also have more artistic ideas. For instance, we thought linking our system to Aphrodite, an animated painting.
Many other objects could be transformed into smart objects, in order to convey emotions.

Configuring and analyzing experiments for movies

  • Unpractical
  • Hard because it's dynamic!

Framework for facilitating emotion recognition for movies

Help the researchers to apply on videos:

  • Features extraction techniques
  • Emotional modeling algorithms

Making experiments such as features extraction and emotion modeling for movies becomes unpractical when it comes to analyzing results. Indeed, it is hard to look at a static 2-dimensional graph and make the link with the dynamic video. For this reason, we have created a framework, whose purpose is to help researchers with emotion recognition for movies tasks (e.g. apply features extraction techniques and emotional modeling algorithms on video datasets).

The system is constituted of two parts:

  • The backend: Written in Python, the backend manage the videos, extract the features, compute the emotional information such as the arousal, and expose all useful datasets and videos information through a RESTful API.
  • The frontend: HTML5 application, allowing to manage datasets and visualize the resulting features and emotional information. The choice of a RESTful API will ease the access to the emotional information from other applications and devices, such as Smart TVs.

We want to finalize this framework, and offer it to the community as open source software. We think that it will ease the process of analyzing features extraction and emotion modeling of movies for other researchers.

Few existing dataset for emotional images classification

For automatic classification, we need a lot of training images

  • Ground truth by manual annotation is cotly

We have constituted a big dataset of highly positive and highly negative images

  • Weakly labeled images from (Reddit)

Train classifier for positive vs negative image classification

  • Integrate it in our empathic TV system
    • Warn users about incoming shocking scenes
    • Enhance emotion recognition

TV comsumption have changed

  • Multiscreen watching
  • Second screen applications

But TV ads model have not evolved for years

  • Still linear and unpersonalized
  • Annoying: intersecting the movies and TV shows

We propose new ways of TV ads consumption

  • Deported to the second screen
  • Integrated into small recreative games
  • Personalized to the user
  • Contextualized to the main content (movie or TV show)
  • ...


{{ school.group.name }} ({{ school.university.name }})

{{ person.title}} {{ person.name }}

{{ person.homepage }}{{ getAuthorsList(person.students) }}


{{ publication.title }}

{{ keyword }}

{{ publication.conference }}

{{ getAuthorsList(publication.authors) }}

{{ publication.year }}, p. {{ publication.pages }}



{{ item.title }}
Show results


Projects Supervision

Several projects related to emoTV have been created and supervised, for Master and Bachelor students.

Master Thesis
Master deepening project
Bachelor Thesis
Bachelor semester project

{{ project.title }}

{{ project.type }} of {{ project.student }}, {{ project.year }}

{{ keyword }}

{{ project.description}}



WebVideos, automatic concept detection

Fribourg, February 7-8 2013

I had the pleasure to welcome two great researchers, for a seminar on the exciting topics of web video concept detection (+ tag expansion and localization) and trending topics and demography prediction from YouTube.

  • Marco Bertini, University of Florence (Italy)
    • MICC (Media Integration and Communication Center)
  • Damian Borth, University of Kaiserslautern (Germany)
    • IUPR (Image Understanding and Pattern Recognition) Research Group
    • DFKI (German Research Center for Artificial Intelligence)

Many thanks to Elena Mugellini, Maria Sokhn and Stefano Carrino who helped me in the organization of this seminar.


emoTV is the PhD Thesis of Joël Dumoulin. It stands for Emotion recognition in multimedia content for empathic TV. This work is a collaboration between the {{ team[0].group.name }} ({{ team[0].university.name }}) and the {{ team[1].group.name }} ({{ team[1].university.name }}. It addresses the fields of automatic video annotation, smart TV and emotions.


Feel free to contact me in case you need any information regarding the EmoTV project.


{{ about.mail }}

Personal homepage

{{ about.homepage }}


Google Scholar