Treating 'Natural Stupidity' in Qual research via Artificial Intelligence

The objective of this blog is to address the age-old limitations/questions being raised on qualitative research and finding the solution using state of the art techniques/methods used in Artificial Intelligence.

The key counter-arguments being raised (examples of Natural Stupidity) are:

Hawthorne Effect – Participants change their responses in response to being observed
Subjectivity guides everything in qualitative research from the choice of topic that one studies, to formulating hypotheses, to selecting methodologies, and interpreting data.

Key Definitions

“Natural Stupidity” is defined as our lack of meaning, lack of memory, inability to think fast and inability to process too much information and the inability to cover all the aspects, taking shortcuts in collecting, analyzing and interpretation of the data. Here, I am trying to address the laziness of a researcher & not being meticulous and methodical.

Artificial intelligence (AI), the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings. In simple terms how well machines can mimic human behavior

We developed a three-pronged approach to solve the aforementioned limitations:

1: To introduce the computer visioning technique of detecting the facial expressions of the respondents, who are reacting to the stimuli (A written Concept, New Product Idea, Audio-Video film).

2: To introduce the Natural Language Processing (NLP) technique to process the verbal feedback(s) and understanding the emotions as well as the sentiments (overt and covert) based on the audio responses being captured via the qualitative method(s)

3: Qualitative Focus Group Discussions to capture the reaction(s) of the stimuli Further, triangulating the outputs of the three methodologies.

Detailing Out The Three Methodologies

Methodology 1:

Detecting the facial expression (especially the micro-expression as defined by Paul Ekman: Micro Expressions are facial expressions that occur within a fraction of a second. This involuntary emotional leakage exposes a person’s true emotions) of a human via machine (using AI techniques). The model architecture is based on the work of Paul Ekman. He classified expressions in the following categories: Happy, Sad, Angry, Surprised, Scared, Disgusted, and Neutral.

This information can be used in compliment with FACS (Facial Action Coding system). The FACS is a comprehensive, anatomically based system for describing all visually discernible facial movement. It breaks down facial expressions into individual components of muscle movement, called Action Units (AUs).

Implementation:

Training the Model/Creating the meta-data:

The machine (model) is fed with the manually tagged data using Facial expressions (using FACS coding system), for it to learn the techniques/system of reading the facial expressions
Model (or Machine) trained on the above meta-data

Predicting the emotions/expression:

At this stage, the machine is asked to predict the emotion of the new video (where the respondent has reacted to the stimuli). The main output is as follows:

At an individual level: It is a classification of the facial expressions of the respondent/participant. If he/she happy, sad, neutral, angry, surprised etc.
Group-level analysis: The output of the individual is mapped against the output of this target population
Population analysis: It shows the average expressions of the entire population

You might also like: Digital Consumer Intelligence: Bridge the digital divide between brands and consumers

Methodology 2:

Natural Language Processing (NLP) is a field at the intersection of computer science, artificial intelligence, and linguistics. The goal is for computers to process or “understand” natural language to perform various human-like tasks like language translation or answering questions. Here the attempt is to process the verbal feedback(s) and understand the emotions as well as the sentiments. The attempt would be to detect the sentiments/emotions behind speech (i.e. what he said). The machine can parse the response to identify the emotion behind each part of the story.

How to implement this technique?

Training the Model/Machine using different NLP techniques to understand the hidden sentiments in the tone of the speech.

Predicting the sentiment/emotion:

At this stage, the machine is asked to predict the emotion/sentiment of the text/speech (where the respondent has reacted to the stimuli).

Methodology 3:

Qualitative Focus Group Discussions can help reduce the subjectivity – This will be done in the usual manner and then the data can be triangulated with the machine’s predict data (Facial Expression and Sentiments basis Audio)

The data captured on the large sample can be generalized for the population (using the meta-data). The meta-data would increase after every research and machine will become more effective in predicting the better results by learning after every research.

Limitations of the design

Data Insufficiency: At the onset, data is very limited for the machine to predict the accurate results hence the objective should be to tagged as many FGD videos to create a meta-data
Biased Model: Imbalance in terms of classes (i.e. More Neutral responses vs Sad responses) would lead to model being biased and would yield sub-optimal predictions