Science of emotions: how smart technologies are getting better at understanding people
Valentina Evtyukhina, the author of the Digital Eva Telegram channel, and Neurodata Lab talk about the development of emotion recognition technologies in an article written specially for the Netology* blog.
The boom in the industry occurred in 2015-2016, when two technological giants — Microsoft and Google — launched their pilot projects on the science of emotions accessible to ordinary users.
This gave the impulse for creation of a variety of applications and algorithms based on emotion recognition technology. For example, Text Analytics API is one of Microsoft Cognitive Services that allows developers to incorporate ready-made smart algorithms into their products. Among the other services of the package there are tools for recognizing images, faces, speech, and many others. From that point on emotions could be determined by the text, voice, photo and even video.
Gartner claims that in 2021-2022 our smartphone will know us better than our friends and relatives, and interact with us on a subtle emotional level.
What’s going on with the market of emotion recognition technologies?
It exists, it’s young, and it has great potential.
Today the emotion detection market is booming, and according to experts, by 2021 it will grow, by various estimates, from $19 billion to $37 billion.
Thus, according to the Markets&Markets agency, the global volume of the Emotion AI industry in 2016 was $6.72 billion, and it is expected to increase up to $36.07 billion in the mid-2020s. This market is not monopolized. There is a place for corporations, for laboratories, and for start-ups. Moreover, it is somewhat normal for corporations to integrate into their solutions the best practices of smaller companies.
Emotional and behavioral technologies are in demand in various fields, including healthcare.
Turning to the global experience, we can recall how Empatica, led by Rosalind Picard, was the first in the world to receive permission a few weeks ago from the US FDA for the commercial use of their wearable Embrace bracelet. Not only it monitors the owner’s physiological data, but also evaluates his emotional background and predicts the likelihood of difficult situations for the body to occur. This can help people with autistic disorders, depression and in difficult cases in neurology and medicine.
The Israeli company Beyond Verbal in cooperation with Mayo Clinic is searching for vocal biomarkers in the voice of a person. These markers can not only detect emotions, but also provide the opportunity to predict aortocoronary, Parkinson’s and Alzheimer’s diseases, what already brings emotion science to the topics of gerontology and the search for ways to slow aging.
When we talk about the applicability of technology, the main focus is on B2B in sectors like intellectual transport, retail, advertising, HR, IoT, gaming.
There is also demand in B2C: EaaS (Emotion as a Service) or cloud analytical solutions (Human data analytics) will allow any user to download a video file and receive all the emotional and behavioral statistics for each fragment of the record. Moreover, in a couple of years, emotion recognition technology might be in every smartphone.
Stack of technology and science
AI boom is predicted to occur in 2025-2027.
Clever interfaces for recognizing human emotions will be a trend — the software will allow you to determine the user’s state at any time using just a webcam.
This is a promising niche, since the detection of a person’s emotions can be used for commercial purposes: from analyzing the perception of video and audio content to investigating criminal cases.
On the other hand, there are endless opportunities for the entertainment industry. For example, the new iPhone X has a Face ID technology that not only unlocks the phone, but also can create an emoto with your facial expressions.
Most new products in emotion science are built on the detection of the seven basic emotions and microexpressions on the face, which reflect our emotional state at a level beyond the control of the brain. We can consciously restrain a smile, but slight twitching of the corners of the lips will remain, and this will be a signal for emotion recognition technology.
There is also a block of technologies specializing in the speech and voice analysis and eye tracking. The use of these methods in psychiatry or criminal matters will help find out a lot about the emotional state of a person and his/her true mood, using the information about the smallest changes in facial expressions and body movements.
Today companies and teams can use open scientific data on emotion recognition and use them in a stack with technologies, — this is what affective computing stands for.
A huge contribution to the development of the market of emotion technologies was made by the FAANG (Facebook, Apple, Amazon, Netflix, Google) and techno-giants like IBM.
Emotion recognition technologies and the law
There are no direct legislative barriers for emotion technologies, and the industry itself is regulated quite weakly and pointwise. Though there are potential barriers and fears, and first of all, a problem of privacy and personal data protection.
Emotions are private, personal data about a person, his/her feelings, responses to stimuli, people and the environment, thoughts and intentions, sometimes not fully realized rationally.
However, global digitalization, the spread of gadgets and devices of any kind, the growing appeal to images and video (several billion videos are shared on the net every day), publicity in social networks allow you to effectively extract emotional data from open sources and use them to analyze a person — as a consumer of goods and services, and as a user. And all this should take place within the legal field, responsibly and ethically.
The new European regulation on the protection of personal data (GDPR) involves a number of limitations. Data for machine learning algorithms training can be used freely if:
— they remain depersonalized, that is, biosensory data is separated from biometrics (identification of people);
— the group format is observed (analysis of the crowd, not single subjects);
— a person should be informed about and be in agreement with the analysis being conducted, otherwise it will be considered as a violation of the rules and will entail responsibility.
Where in the upcoming years we will need emotion recognition
The healthcare industry actively implements the most modern methods for collecting and analyzing the data about patients or users. Computer algorithms can define symptoms using hundreds and thousands of similar cases.
There are already mobile applications that analyze the psychoemotional state by photo and text. The more a person will communicate with the program, the better it will learn and understand him/her to do accurate predictions for the treatment.
It’s one thing, when the device simply catches and «understands» your mood as well as it can, and in accordance with it plays music, regulates lights or prepares coffee. Another case is when it evaluates how tired you are judging by your appearance, or determines any deviations from the norm. Or when it can detect a disease, for example, Alzheimer’s or Parkinson’s.
Long before its manifestation, the disease begins to affect the muscles of the face, the speed of the eye movement, the imperceptible, it would seem, changes in voice and micromovements.
The TV-series «Lie to Me» was released in 2009 and immediately became popular all over the globe. The main character, Dr. Laitman, can determine if the person is lying by observing the micromimics on the face. This is his «superpower», which helps him to find a murderer and disclose a complicated network of crimes.
Neural interfaces can do the same thing even better and faster. It is possible to film a person during an interview and then use a special program to predict the percentage of expressed emotions on his/her face — anger, fear, bitterness, resentment, etc. These data will help the investigation to understand at what point a person could cheat or leave something unsaid.
Social Media Monitoring
It is common to assume that the Internet does not transmit emotions, but it is not actually true. By a series of tweets or posts on Facebook, you can accurately determine in what mood and state the user was at the time it was written.
The simplest example of how to determine the psychoemotional state by the stylistics of a text is a well-known case when a person ends the messages with a period, and his interlocutor perceives this as a sign of something going wrong in the course of the conversation.
On the global scale, with the help of machine learning, we could create a system that would monitor outbreaks of anger, requests for help or fear in the messages and respond to them — for example, send a signal to rescue services.
Already, the world’s retail networks maximally integrate online to offline, trying to find out what the buyer wants and what he will most likely buy. When neurointerfaces have reached the level of accurate highly sensitive emotion recognition, adverts in the show-windows of the shopping center will be adjusted in seconds to the mood of the people passing by. We can find such technology in the movies like “Minority Report” and “Blade Runner 2049”.
A scene from the «Blade Runner 2049» movie, where a holographic advertising gynoid reacts to the emotions on the face of the protagonist.
About a year ago, in April 2017, a research team from San Francisco taught the LSTM neural network to recognize the emotional component of the text with higher accuracy. Now the machine almost unmistakably identifies the mood in the user reviews on the sources like Amazon or Rotten Tomatoes. This helps to improve the service and to predict the popularity of the product among users.
When the first model of Google Glass came out, people expected gesture control would reach the new heights — in order to read the text on the lens, it was enough just to draw the eyes from top to bottom for the system to realize that you have finished reading and to proceed to the next piece. Despite the fact that the gadget itself did not move beyond the prototype, the story with the study of eye movement switched to a new promising industry — gaming.
Game developers have to understand what the player feels and when, what is the influence of special effects and gaming obstacles. The developer of emotion recognition technologies, Affectiva, helped create the Nevermind game, where the plot unfolds according to how tensed and stressed the player feels at the moment.
Neurodata Lab Experience
In early 2016, the team of the Envirtue Capital Foundation set out to develop projects within its the R&D laboratory, completely autonomous and financed from its own sources. This is when Neurodata Lab LLC was born.
«Since September 2016, we began to form our team that today includes both scientific — specialists in natural and cognitive sciences, — as well as technical experts with competencies and background in computer vision, machine learning, data science. The interdisciplinary nature of emotion research predetermined our choice in favor of a mixed team. This approach allows us to consider the problems from different points of view, to combine both the purely technical part and the views and ideas from biology, psychophysiology and neurolinguistics.»
— George Pliev, Managing Partner of Neurodata Lab
Neurodata Lab develops solutions that cover a wide range of areas in emotion research and recognition by audio and video, including technology for voice splitting, layer-by-layer analysis and identification of the speaker’s voice in the audio stream, complex tracking of the body and hand movements, as well as detection and recognition of the key points and movements of the facial muscles in the video stream in real time.
One such project is the EyeCatcher software tracker for the retrieval of eye and head movements from the video files recorded on a regular camera. This technology opens up new horizons in the study of human eye movements in natural, not laboratory, conditions, and significantly expands the research capabilities. We can now learn how a person would contemplate a picture, react to sound, color, taste, what are the eye movements when he/she is happy or surprised. These data will be used as a basis for creating a more sophisticated technology for human emotion recognition.
«Our goal is to design a flexible platform and develop technologies that will be in demand by private and corporate customers from various industries, including niche markets. In emotion detection and recognition, it is important to remember that human emotions are very variable, «elusive» essence, which often varies from person to person, from society to society; there are ethnic, age, gender, sociocultural differences. To reveal such regularities, it is necessary to train algorithms on very large samples of qualitative data. This is what our laboratory is now concentrated on.»
— George Pliev
One of the main difficulties for research groups in the study of emotions is limited and «noisy» data obtained in a natural setting, or the need to use uncomfortable wearable devices to track the experiment participant’s emotional state. Therefore, as one of its first projects, Neurodala Lab team assembled a Russian-language multimodal dataset RAMAS (the Russian Acted Multimodal Affective Set) — a complex set of data on emotions that includes parallel recordings from 12 channels: audio, video, eye tracker, wearable motion sensors, etc. about each of the situations of interpersonal interaction. The dataset was created with the help of semi-professional actors from the Russian State University of Cinematography, recreating various situations of everyday communication. Today, access to the multimodal RAMAS database is provided free of charge to academic institutions, universities and laboratories.
The availability of a broad database is one of the key factors for conducting qualitative emotion research. Unfortunately, in a laboratory and game simulations such a base cannot be accumulated. To solve this known issue, Neurodata Lab developed and launched their own Emotion Miner platform for collecting, marking, analyzing and processing emotional data. More than 20,000 annotators were marking data from more than 30 countries. To date, the Emotion Miner Data Corpus is one of the world’s largest marked multimodal emotional video datasets.
Since the establishment of Neurodata Lab, the laboratory collaborates with academic institutes, universities, laboratories and competence centers in the USA, Europe and Russia, and actively participates in major international conferences, including Interspeech and ECEM, publish academic articles. The company took part in the summit on emotional artificial intelligence promoted jointly by MIT and Affectiva, and in March 2018, organized and conducted, together with the National Research University of Information Technologies, Mechanics and Optics, the first conference on Emotion AI in Russia: “Emotion AI New Challenges for Science and Education, New Opportunities for Business.»
«When emotion recognition technologies have reached their maturity, they will have a significant impact on the entire ecosystem, on the entire technosphere, allow people to communicate better, deeper and more fully with each other with the help of gadgets in the world of smarter machines with a human-computer interface. The technology has the potential for developing mutual understanding and empathy, will help people with disabilities (for example, with autism) and will find the keys to alleviating socially-critical diseases. However, not only technology is important, but also how people use it. We fully share the ethical imperative and proceed from the premise that the system of checks and balances, including legislative ones, will not turn Emotion AI into the technology for total control. Its mission is to help people, not limit their freedom, their rights, their personal space. Of course, certain aberrations are inevitable, but they are removable.»
— George Pliev
*Netology is a university providing courses in the areas of Internet marketing, project management, design, interface design and web development.