Overview of Computerized Adaptive Testing (CAT)

Overview of Computerized Adaptive Testing (CAT)

Sugandha SrivastavaDecember 26th, 202311 min read

Computerized Adaptive Testing (CAT) is a revolutionary approach to examinations conducted on a computer, dynamically tailoring questions to match the abilities of individual test takers. This adaptive process ensures that each examinee faces a unique set of questions aligned with their performance, offering a personalized testing experience. CAT not only enhances efficiency and reduces testing time but also bolsters test security by customizing content for each participant. In this article, we delve into the roots of adaptive testing, explore the distinctive features of CAT, and highlight its historical evolution. Join us in unraveling the advantages and challenges of CAT, aiming to provide you with a comprehensive understanding of this innovative testing methodology.

Get started for free
Transform your tech hiring process and unlock the true potential of your organization.
What Is Computerized Adaptive Testing?

In simple terms, computerized adaptive testing (CAT) is an exam conducted on a computer that adjusts the difficulty of questions based on each individual's performance. This type of test adapts in real-time to the test taker's abilities, providing questions that match their skill level. CAT is a secure exam design that safeguards test content and reduces the likelihood of cheating. By tailoring questions to the test taker, CAT allows for quicker administration, requires fewer items, and enhances overall test security. To gain a deeper understanding of CAT, it's helpful to explore the roots of adaptive testing.

What Is Adaptive Testing?

Adaptive testing, as its name implies, adjusts exam questions in real-time according to each test taker's abilities. This leads to a unique set of questions for each individual. The adaptation is based on the test taker's performance in answering previous questions. When a test taker correctly answers most questions, the system selects more challenging ones. Conversely, if the individual struggles with previous questions, easier ones are presented. After a relatively small number of questions, which varies for each person, the test concludes and provides a score. Unlike traditional tests that assess the number of correct answers, the score in an adaptive test is determined by the difficulty level of the questions the test taker reached.

Due to its variable starting and stopping points, adaptive testing is highly efficient, requiring fewer items compared to a conventional test. One of the earliest instances of adaptive testing dates back to the Stanford-Binet Intelligence Scale administered at the beginning of the 20th century. More details about this historical test can be explored in this section.

How is a Computer Adaptive Test Different from an Adaptive Test?

In comparison to traditional adaptive testing, computer adaptive testing (CAT) denotes that the adaptive test is conducted on a computer rather than using a paper-and-pencil format. In the contemporary context, most tests are administered digitally rather than on paper. The shift to computerized exams represents a substantial advancement in the testing industry, bringing about advantages such as quicker scoring, enhanced accessibility, improved fairness, streamlined administration, and heightened security, among other benefits.

How Does Computerized Adaptive Testing Work?

Computerized adaptive tests (CATs) access a carefully curated pool of test items during the exam, spanning a range from easy to complex. The difficulty of these items is determined from collected data. A robust item pool includes items at various difficulty levels. The CAT algorithm selects items from this pool that align with the test taker's estimated ability as questions are answered, continuing until the test concludes. Essentially, after each item response, the computer re-evaluates the test taker's ability and selects a question that the individual has approximately a 50% chance of answering correctly. This iterative process aims to provide a more precise measurement of the test taker's ability on a standardized scale.

Throughout the test, if a test taker's estimated ability is high (indicating proficiency in answering difficult questions), the CAT selects items from the "difficult range" in the item pool. This process remains consistent for all estimated ability levels, spanning from low to high, and for various levels of breadth.

Upon completing a sufficient number of questions—typically fewer than an equivalent traditional test—a reliable score for the test taker is computed. Importantly, the score is not based on the number of correct answers but on the difficulty level of items the individual can answer correctly. This functional similarity to scoring in track and field, such as high jump scoring, underscores the adaptability and efficiency of computerized adaptive testing.

The History of Adaptive Testing

The history of adaptive testing dates back to the early 20th century, with its roots in intelligence testing. Here are key milestones in the development of adaptive testing:

Stanford-Binet Intelligence Scale (Early 1900s): The Stanford-Binet Intelligence Scale, developed by Alfred Binet and Theodore Simon, was one of the earliest instances of adaptive testing. This intelligence test was designed to measure a person's cognitive abilities and used a series of questions that adjusted based on the individual's responses.

Thurstone's Progressive Matrices (1938): Louis Leon Thurstone introduced the concept of item difficulty in his Progressive Matrices test. This test included items of varying difficulty levels, and the difficulty of subsequent items was determined by the test taker's performance on earlier items.

The Tailor-Made Test (1950s): The Tailor-Made Test, created by Benjamin D. Wright, allowed for the adaptation of difficulty levels based on a person's responses. This marked a shift toward more systematic adaptive testing.

Gradual Shift to Computerized Adaptive Testing (1960s-1970s): With the advent of computers, the idea of computerized adaptive testing gained traction. Early computerized adaptive tests were developed to administer exams more efficiently and precisely by adjusting question difficulty based on the test taker's performance.

1979 National Assessment of Educational Progress (NAEP): The NAEP conducted one of the first large-scale computerized adaptive testing programs, signaling the potential for widespread application in educational assessments.

1980s-1990s: Widening Applications: The use of adaptive testing expanded across various domains, including educational assessments, professional certification exams, and military testing. Researchers and practitioners refined the algorithms and methodologies for adaptive testing.

The Rise of Item Response Theory (IRT): Item Response Theory, a statistical framework for modeling individual responses to test items, became integral to adaptive testing. IRT allows for a more accurate estimation of an individual's ability based on their responses to specific items.

Modern Computerized Adaptive Testing (21st Century): Advances in technology and statistical modeling have facilitated the widespread adoption of computerized adaptive testing. Various standardized tests, such as the Graduate Record Examination (GRE) and the Graduate Management Admission Test (GMAT), utilize adaptive testing to tailor assessments to individual test takers.

Today, adaptive testing continues to evolve with ongoing research, technological advancements, and a growing understanding of psychometrics—the science of measuring cognitive abilities and traits.

Get started for free
Transform your tech hiring process and unlock the true potential of your organization.
Examples of Adaptive Measurement Outside of Testing

The concept of adaptive measurement is not exclusive to testing but can be observed in various areas, including sports like high jumping. In the world of high jumping, the adaptive approach has been in practice for a much longer time, offering insights into the efficiency and effectiveness of such methodologies.

In high jump competitions, which originated in 19th-century Scotland, the process is inherently adaptive. Competitors begin with a bar set slightly lower than their overall capabilities. More proficient jumpers may skip the initial heights, streamlining the competition. Since participants are eliminated when they fail to clear a height, the high-jump format is inherently efficient. The abilities of each competitor are swiftly determined through a combination of successful and unsuccessful attempts. Ultimately, the winner is decided based on the highest bar height cleared.

Contrast this with a hypothetical scenario where high jump competitions follow the traditional testing model. In such a scenario, a jumper would be required to attempt every bar height in 3-inch increments from 3 feet to 10 feet, totaling 28 jumps. The score would be based on the number of successful jumps out of 28. While this approach might provide a reasonably accurate measure of high jump ability, it would be an arduous and frustrating experience for the jumper. The excessive number of jumps could lead to fatigue, and the jumper might find the lower heights boring and the higher ones frustrating.

This analogy highlights the contrast between traditional testing experiences and the more adaptive and efficient approach facilitated by tools like computerized adaptive tests. By tailoring questions to the individual's ability level, adaptive testing reduces unnecessary challenges, minimizes fatigue, and provides a more engaging and effective assessment experience.

The Advantages of Computerized Adaptive Testing

Computerized adaptive testing (CAT) offers several compelling advantages that contribute to a positive testing experience:

Time Efficiency: CAT requires fewer items to determine a test taker's score compared to traditional tests. This results in a significantly shorter testing time, potentially reducing it by half. Examinees benefit from a quicker and more streamlined testing process.

Cost Savings: The reduced testing time translates into lower test administration costs. This efficiency is not only beneficial for test takers but also for organizations managing the tests, leading to potential cost savings.

Enhanced Test Security: CAT's adaptive nature means that each test taker receives a unique set of questions. This minimizes the exposure of test items and makes cheating, particularly through pre-knowledge or item harvesting, more challenging and less rewarding. The dynamic nature of CAT forms adds an extra layer of security, making answer copying during in-person test administrations difficult.

Mitigation of Fatigue and Boredom: CAT customizes the difficulty level of questions based on individual performance. Test takers are spared from answering both excessively easy and overly difficult questions, creating a more engaging and pleasant testing experience. This adaptability helps reduce fatigue and boredom.

Positive Candidate Preference: Surveys, such as the one conducted at Novell in 1995, indicate that a significant percentage of candidates prefer computerized adaptive tests. The adaptive format, once explained, often alleviates concerns about the shorter duration of the test. Over time, candidates gain confidence in CAT's ability to accurately measure their knowledge and experience.

Confidence in Measurement: With years of use, candidates and organizations using CAT gain confidence in its ability to effectively distinguish between individuals with varying levels of knowledge and experience. This confidence contributes to the widespread acceptance and preference for computerized adaptive testing.

Why Isn’t CAT More Widely Adopted?

The primary challenge encountered in Computerized Adaptive Testing (CAT) is the calibration of the item pool. To model item characteristics and select the optimal item, all test items must be pre-administered to a substantial sample, a process known as "pilot testing," "pre-testing," or "seeding." New items are integrated into operational items, with responses recorded but not contributing to test-taker scores. This approach introduces logistical, ethical, and security issues. Operational adaptive tests cannot include entirely new, unseen items, requiring pretesting with a large sample, sometimes as large as 1,000 examinees. Programs must determine the reasonable percentage of the test composed of unscored pilot test items.

Despite exposure control algorithms in adaptive tests preventing overuse of specific items, exposure conditioned upon ability is often uncontrolled, posing a security concern. Some items may become overly common among individuals with similar abilities, potentially compromising test security. While a completely randomized exam is the most secure, it is also the least efficient.

Reviewing past items is generally disallowed in adaptive tests. After an incorrect answer, easier items are administered, raising concerns that astute test-takers could use these clues to detect and correct mistakes. Alternatively, test-takers might be coached to intentionally select wrong answers, leading to an artificially easier test. This manipulation could allow them to review items and answer correctly, potentially achieving a high score. Test-takers often express frustration about the inability to review.

The development of a CAT necessitates prerequisites, including large sample sizes for Item Response Theory (IRT) calibrations, real-time scorable items for instantaneous selection, and experienced psychometricians for validity documentation. Additionally, a software system capable of true IRT-based CAT must be available.

Get started for free
Transform your tech hiring process and unlock the true potential of your organization.
Summing It Up

The aim of this article is to offer readers a conceptual understanding of computerized adaptive testing without delving into technical explanations, test theories, or formulas. Instead, drawing from my practical experience spanning over 30 years in creating and utilizing CATs for K-12 education and certification exams, I provide straightforward definitions and examples.

In essence, a computerized adaptive test is an exam conducted on a computer that employs algorithms to customize the difficulty levels of test questions based on an individual's previous correct and incorrect answers. This adaptive approach ensures that each test taker encounters a unique set of questions tailored to their responses. I aspire that this overview sparks your interest in adopting computerized adaptive tests and encourages you to explore further how to integrate them into your program. My hope is that, regardless of your program's size or your staff's qualifications, implementing CAT will be as seamless as administering any other test.

Signup Now!

We are already working with teams that want to hire the best engineers

Signup now for free trial
Book a meeting with sales