TECH SPECS, SCRIPTS, ANNOTATION

Description

The multimodal database RAMAS (The Russian Acted Multimodal Affective Set) includes: synchronously recorded video from several angles, multi-channel audio, physiological indicators (electrical activity of the skin and heart rhythm) and three-dimensional coordinates of movements.

Etudes were played by Russian-speaking students and graduates of the All-Russian State Institute of Cinematography of S.A. Gerasimov (VGIK). Semi-professional actors were invited for participation in order to get more natural emotions in records, as it has been shown that professional theater actors can use stereotyped patterns of movement to express emotions (Russell & Fernández-Dols, 1997; Volkova, De La Rosa, Bülthoff, & Mohler, 2014). Etudes were modeled to cause different emotional states (joy, anger, sadness, disgust, fear, surprise, neutral state) and social interactions (domination, submission) in given situations played out by actors in a semi-improvisational dialogue form.

Each actor signed a document providing formal consent to participate in the study and to use of survey materials for scientific and commercial activities. The participants were paid the amount equal to the cost of one day of shooting. The selection of actors and the process of writing of the scenarios and filming was conducted under the supervision of the senior teacher from the VGIK acting skills department - E.B. Arkova.

All the video sequences were marked by annotators (a total of 21 people) in the Elan program (Sloetjes & Wittenburg, 2008).

Actors

Five pairs of actors took part in the recording of the database: 5 women and 5 men, aged 18-28. The shooting took 6 days, one pair participated twice (the first time - in the test mode). All the couples were mixed-gender. All participants had a professional acting experience of at least two years. All the filming days consisted of one set of scenarios with small variations for the convenience of expressing emotions by specific actors.

After filming, the participants filled in a test of manual asymmetry (Oldfield, 1971), a Russian version of the questionnaire of the scale of positive and negative affect PANAS (Osin, 2012), a questionnaire on emotional intelligence (Lyusin, 2006, 2009), a personal questionnaire of Spielberger-Khanin (Khanin, 1976).

During the recording, the subjects were dressed in comfortable tight clothes of dark colors. The actors' faces were without makeup, long hair retracted into a tail.

Actors

Five pairs of actors took part in the recording of the database: 5 women and 5 men, aged 18-28. The shooting took 6 days, one pair participated twice (the first time - in the test mode). All the couples were mixed-gender. All participants had a professional acting experience of at least two years. All the filming days consisted of one set of scenarios with small variations for the convenience of expressing emotions by specific actors.

After filming, the participants filled in a test of manual asymmetry (Oldfield, 1971), a Russian version of the questionnaire of the scale of positive and negative affect PANAS (Osin, 2012), a questionnaire on emotional intelligence (Lyusin, 2006, 2009), a personal questionnaire of Spielberger-Khanin (Khanin, 1976).

During the recording, the subjects were dressed in comfortable tight clothes of dark colors. The actors' faces were without makeup, long hair retracted into a tail.

Recording conditions

The shooting took place in a specially equipped room that had sound-absorbing protection to improve the quality of sound recording and a green background to improve the recording of motion data.

Equipment

Video:

  • Close-up of the face of each of the actors: two cameras (Canon HF G40, Panasonic HC-V760), 50 FPS.
  • Each actor in full height: two webcams (Canyon CNE-CWC3), 15 FPS.
  • The whole scene, both actors in full height: the camera (Microsoft Kinect RGB-D sensor v. 2, RGB and Depth video), 15 FPS.

Audio:

  • Individual microphones Sennheiser EW 112-p G3, attached to clothes, 44100 Hz, 32-bit float.
  • Sound stage: portable recorder Zoom H5 44100 Hz, 32-bit float.
  • Additional audio channels – Kinect microphones (4 channels) and cameras (1 channel for each camera).

Movement

Three-dimensional coordinates of movements for 25 points (joints) of each of the actors were recorded using Microsoft Kinect RGB-D sensor v. 2. 15 FPS.

Physiological parameters:

  • The photoplethysmogram (the sensor was placed on the earlobe) and
  • Electrodermal resistance of the skin (the sensors were placed on the index and ring fingers of the left hand, all actors are dextral) were recorded using the Shimmer Consensys GSR Development Kit. 101.5 Hz.

For the qualitative recording of the Kinect data, the actors were asked to stand in certain marked zones (40 * 30 cm). At the same time, there were no restrictions on gesticulation, postures and small movements, but, if possible, it was required not to exceed the limits of the marked area. The distance between the actors was 1.9 meters.

Synchronization of all recording channels, except Canon HF G40 and Panasonic HC-V760 cameras, was carried out using real-time SSI software (Wagner et al., 2013). After that, the data from the cameras was synchronized with the rest of the channels based on the audio tracks using PluralEyes and DaVinci Resolve software.

Scripts

For the qualitative recording of the Kinect data, the actors were asked to stand in certain marked zones (40 * 30 cm). At the same time, there were no restrictions on gesticulation, postures and small movements, but, if possible, it was required not to exceed the limits of the marked area. The distance between the actors was 1.9 meters.

A total of 13 scenarios was presented. They were divided into two subcategories - "Friends" (7 scenarios describing informal situations between closely related people) and "Colleagues" (6 scenarios describing the situation between employees of a company). In each scenario, a description of the situation was given and the attitude towards it of each participant is noted. However, there were no written replicas and phrases. Thus, the actors had the opportunity to improvise and, if they desired, to vary some circumstances of the situation in order to obtain more natural emotions.

In each scenario, the actors had to play one of six emotions: joy, anger, sadness, disgust, fear, surprise. The emotions of the couple in each etude could be either the same (for example, both are having fun), or different (for example, fear and anger). In each scenario, social roles were also marked: one actor was in the dominant, leading position (was more active, directed the dialogue), and the second - in a submissive position (guided, more passive).

In some cases, when the actors found it difficult to reproduce an emotion, visual stimulus material was used before the recording of the etude (prints of pictures with scary, disgusting, surprising, etc. scenes close to those presented in the scenarios).

Before recording scenarios, each of the actors was recorded in a neutral state. To do this, they had to count to 20 or list the months or letters of the alphabet; describe their path from home to work; describe the terrain around their house; explain the process of cooking a dish; describe the rules of any game; briefly describe their biography. The recording was made in the same conditions as the main scenarios, the second actor was a passive listener. The recording of the main scenarios began after the recording of the neutral state.

Each scenario was discussed by actors with the teacher for a required amount of time, and then an average of 2 to 5 takes was performed. Each take lasted 30-60 seconds, the beginning and end of the recording were indicated by experimenters. There was also a discussion and adjustment of the process between the takes, with the active participation of both the teacher and the experimenters.

After one or two successful takes of each scenario, the actors filled out a brief self-report on the emotions they experienced during the execution of the etude.

The recording was conducted in two stages (sets "Friends" and "Colleagues"), with a half-hour rest break between them. The sequence of the sets varied from pair to pair. The average time taken to shoot all the scenarios for each pair was about 4 hours.

Scripts examples

Annotation

The presence of emotional and social conditions in all video files were labeled by annotators. To perform this work, 21 annotators were invited, aged 18-33, 15 women, and 6 men. Each annotator has labeled 150 video fragments (except for two annotators that have labeled 150 videos in total), and each video file was marked by at least five annotators.

The emotional intelligence test was used for the selection of annotators, (Mayer, Salovey, Caruso, & Sitarenios, 2003; Sergienko, Vetrova, Volochkov, & Popov, 2010). Participants that scored at least 90 points in this test were invited to participate in the annotation.

The markup was done using Elan software (Sloetjes & Wittenburg, 2008). Templates for markup were created for annotators, in which for each emotional, social and neutral state a separate layer was created (9 layers in all). The annotators had to distinguish the beginning and the end of each of the states encountered on the video. It was allowed to note several emotions in one-time interval (for example, anger and surprise). The annotators were instructed to note only the most natural and qualitatively played emotions.

The work of the annotators was paid for.

Database
structure

Files from each day of filming were placed in a separate folder, the name of which corresponds to the recording date (eg, "10dec"). On 10 and 19 December ("10dec" and "19dec") the same couple of actors participated in the filming.

Each folder with a filming day contains the data of all recorded channels, divided into internal folders by scenarios, where "D" indicates the "Friends" type of the scenario, "K" - "Colleagues", "N" - "Neutral". The first digit after the letters indicates the number of the scenario, the second - the take number (for example, the folder "K2": the second scenario from the "Colleagues" series, and the files containing the numbers "K23" in it refer to the third take of this scenario).

Cited literature

Mayer, J. D., Salovey, P., Caruso, D. R., & Sitarenios, G. (2003). Measuring emotional intelligence with the MSCEIT V2. 0. Emotion, 3(1), 97.

Oldfield, R. C. (1971). The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia, 9(1), 97-113.

Russell, J. A., & Fernández-Dols, J. M. (1997). The psychology of facial expression: Cambridge university press.

Sergienko, E., Vetrova, I., Volochkov, A., & Popov, A. (2010). ADAPTATION OF J. MAYER P. SALOVEY AND D. CARUSO EMOTIONAL INTELLIGENCE TEST ON RUSSIAN-SPEAKING SAMPLE. Психологический журнал, 31(1), 55-73.

Sloetjes, H., & Wittenburg, P. (2008). Annotation by Category: ELAN and ISO DCR. Paper presented at the LREC.

Volkova, E., De La Rosa, S., Bülthoff, H. H., & Mohler, B. (2014). The MPI emotional body expressions database for narrative scenarios. PloS one, 9(12), e113647.

Wagner, J., Lingenfelser, F., Baur, T., Damian, I., Kistler, F., & André, E. (2013). The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time. Paper presented at the Proceedings of the 21st ACM international conference on Multimedia.

Люсин, Д. (2006). Новая методика для измерения эмоционального интеллекта: опросник ЭмИн. Психологическая диагностика, 4, 3-22.

Люсин, Д. (2009). Опросник на эмоциональный интеллект ЭмИн: новые психометрические данные. Социальный и эмоциональный интеллект: от моделей к измерениям/под ред. ДВ Люсина, ДВ Ушакова. М.: Институт психологии РАН, 264-278.

Осин, Е. Н. (2012). Измерение позитивных и негативных эмоций: разработка русскоязычного аналога методики PANAS. Психология. Журнал Высшей школы экономики, 9(4).

Ханин, Ю. (1976). Краткое руководство к шкале реактивной и личностной тревожности ЧД Спилбергера. Л.: ЛНИИФК.

EULA

The RAMAS Database is available for free for academic institutions, universities, laboratories and non-profit organisations for research purposes only. To obtain a user account, please print the End User License Agreement (EULA), sign it, and scan it. Copy and fill in all items of the form below and send it by email with the signed EULA as pdf. In case you do not have an academic homepage, please send any proof of your academic affiliation as permanent member. Only requests from permanent researchers are accepted. PhD students and non-permanent researchers should ask their supervisor to sign. Uncomplete forms are not processed.

In order to proceed with EULA please feel free to contact us: info@neurodatalab.com