First impressions wield significant influence in human interactions, and nowhere is this more evident than in the realm of hiring. The entire hiring process is rife with initial perceptions, starting from the moment a recruiter lays eyes on a resume to the candidate's ultimate interview with a hiring manager. Each initial encounter shapes the evaluator's view of the candidate's responses and qualifications. Notably, it is estimated that a substantial 30% of interviewers form their opinions about an interviewee within the first five minutes of the interview.
At Intervue, we address and mitigate this inherent "first impression" bias in the hiring process by transforming recorded video interviews into validated pre-hire assessments. Through the application of machine learning, we objectively evaluate the complete context of a candidate's response, removing subjective biases. This emphasis on predicting competencies and personality traits from video interviews underscores our primary focus.
The ability to predict personality from a mere 15-second first impression hinges on the quality of the training data in the field of Machine Learning. The ChaLearn First Impressions dataset presented a promising opportunity for such endeavors. Comprising 10,000 clips, each averaging 15 seconds in duration, extracted from over 3,000 YouTube high-definition videos featuring individuals speaking English to a camera, this dataset offered a rich source for exploration.
In one instance, a Intervue customer underwent training and was subsequently presented with pairs of videos for comparison based on the Big Five Personality Traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism), along with an interview flag indicating which person they would prefer to invite for a job interview.
The Big Five Personality Traits are commonly employed to categorize personality, and psychologists typically assess them through multiple-choice questionnaires. However, in this case, rather than relying on traditional methods, the creators of the dataset opted for a more unconventional approach. They used YouTube videos and enlisted human evaluators to determine personality traits, avoiding the time-consuming and expensive process of gathering both video content and administering traditional personality tests.
To evaluate the predictions, video pairs were compared by various evaluators, and sophisticated mathematical techniques were applied to translate these comparisons into comprehensive scores for each video across the six measures (Big 5 traits plus interview progression).
The ECCV ChaLearn LAP 2016 challenge served as a platform for individuals and teams to showcase their ability to predict the Big Five personality traits from video data. Numerous participants demonstrated impressive results, leveraging cutting-edge data processing techniques and algorithms in this innovative application of machine learning.
Despite our extensive exploration of human judgment in the context of assessing job candidates, there arose a curiosity regarding the actual elements being measured under the guise of "apparent personality traits." The challenge lies in the inherent difficulty of human-evaluated personality assessment, even with highly trained evaluators and consistent conditions and questions. The reality is that extracting meaningful personality insights from a mere fifteen seconds of a random video clip seems improbable. Evaluators, even under controlled circumstances, are often left with what amounts to a "first impression," drawing on limited information based on the visual and auditory cues present in the snippet.
While the dataset is labeled as a "personality dataset," its true value lies in providing insights into how humans perceive personality rather than how individuals genuinely exhibit these traits. In essence, the dataset sheds light on the process by which Mechanical Turks, acting as evaluators, intuit personality traits from a brief 15-second video snippet. This investigation delves into the intricacies of human perception and the nuances involved in forming impressions based on a condensed and somewhat constrained dataset.
To conduct our investigation, we leveraged our trained deep learning models to predict various attributes such as age, race, gender, and attractiveness for the subjects featured in the videos. These models were trained using self-identified age, race, and gender, as well as average attractiveness as assessed by other individuals. Subsequently, we examined score distributions generated by Mechanical Turk for each of these measured attributes across different groups, yielding striking results.
In the graphs presented, the x-axis represents the normalized score, indicating that values between 0.9 and 1.0 encompass the top 10% scorers. The y-axis reflects the proportion of individuals assigned a particular score, with the total area under the curve equaling 1.
Three examples of score distributions are depicted below. In the left graph, the majority of the population falls within the upper half of the score range, signifying that most evaluated individuals received high scores. The center graph illustrates a flat distribution, indicating that the same percentage of people received a score of 0.2, for instance, as the percentage who received a score of 0.8. Finally, the graph on the right suggests that the majority of the population received low scores, highlighting a prevalence of lower evaluations.
Given this context, let's delve into the data.
Starting with age, the observed score differences indicate that older individuals are perceived as more conscientious and less neurotic, qualities that may be viewed positively. However, they are also perceived as less agreeable, open, and extroverted. Additionally, older individuals are less likely to be considered for a job interview based on the evaluations. These nuanced perceptions highlight how age can influence the attribution of various personality traits and impact potential employment opportunities.
Examining differences between males and females, the data reveals that female scores tend to be more evenly distributed towards the higher end, with one notable exception – agreeableness. In this aspect, there appears to be a departure from the trend, suggesting that female individuals, while generally receiving higher scores, may exhibit less agreement compared to their male counterparts. This insight sheds light on gender-related perceptions of personality traits, offering a nuanced understanding of how these assessments may vary across different attributes.
Analyzing score distributions by ethnicity, a notable pattern emerges where whites and Asians consistently receive higher ratings than blacks and individuals categorized as "others" across all six dimensions. This suggests a concerning trend of potential bias, as these evaluations indicate a consistent preference or perception of individuals from certain ethnic groups as possessing more favorable personality traits. It underscores the importance of recognizing and addressing biases in the evaluation process, as these disparities in scores may not accurately reflect the true capabilities or qualifications of individuals from different ethnic backgrounds.
When the attractiveness ratings are segmented into three tiers, a prominent trend emerges, perhaps one of the most robust in the data. Notably, this trend is particularly intriguing because considerations of fairness based on physical appearance are often overlooked in many processes (given that being unattractive is not legally recognized as a protected class). Within the First Impressions dataset, individuals deemed more attractive are perceived as possessing more positive qualities across the board.
Conversely, those rated as below average in terms of attractiveness are perceived as having fewer positive traits. Individuals with an average level of attractiveness exhibit a more consistent and flat distribution. This trend underscores the impact of physical appearance on the perception of various personality traits, shedding light on a dimension that is not typically addressed in standard evaluation processes.
These findings offer valuable insights on multiple fronts. The significance of first impressions in the hiring process has long been acknowledged. As previously highlighted, studies indicate that a substantial 30% of interviewers form their decision about a candidate within the initial five minutes of the interview. Moreover, the initial impression formed by interviewers significantly shapes their perception of the candidate's responses throughout the rest of the interview.
From a data science perspective, these results underscore the critical need to scrutinize algorithmic assessments for potential adverse impacts. It serves as a compelling reminder to ensure that AI-driven evaluations do not perpetuate or amplify biases present in the training data. This emphasizes the responsibility of data science practitioners to conduct thorough audits and address any unintended consequences in algorithmic decision-making processes.
Regrettably, the lack of surprise in these results reflects the ongoing struggle against bias in the hiring space. While human input is undeniably valuable, it is also a notable source of bias. Therefore, the emphasis on providing recruiters and hiring managers with an objective evaluation of each candidate becomes paramount. A well-vetted AI system can serve as an objective decision support tool, furnishing crucial insights to facilitate better and less biased hiring decisions.
The findings emphasize the need for objective decision support through carefully audited AI-driven assessments to counteract these biases in hiring processes. Notably, age influences conscientiousness positively but negatively affects other traits, while gender and ethnicity introduce nuanced biases. The study highlights the vital role of technology in fostering fair evaluations and underscores the importance of reevaluating traditional hiring practices for more equitable decision-making.