Thereby, the study aims to exploit temporal information of audio-visual cues and detect their informative time segments.
In order to enhance emotion communication in human-computer interaction, this paper studies emotion recognition from audio and visual signals in video clips, utilizing facial expressions and vocal utterances. Emotions play a crucial role in human-human communications with complex socio-psychological nature.