What is Computerized Adaptive Testing?

Computerized Adaptive Testing (CAT) is a computer-based assessment system in which an individual encounters questions appropriate to their ability level throughout the test administration. Unlike traditional paper-and-pencil tests, CAT allows test takers to answer questions at their own level of competence and ability. CAT system has a large item pool whose item parameters have been calibrated through pilot studies. Based on an individual's response to an item, ability level of the individual is estimated temporarily and then the next item that will provide the most information for this ability level is selected from the pool. For instance, if an individual responds correctly to an item of medium difficulty, the algorithm presents a more difficult item in the next step. However, if the individual cannot answer the item correctly, an easier item is administered. In this way, the ability estimation is estimated with less measurement error and with fewer items since test takers will only encounter items that match with their ability level (Kalender, 2009). This process is repeated until the test ends.

CAT consists of five components (Weiss & Kingsbury, 1984; Thompson & Weiss, 2011). These are calibrated item bank, starting rule, item selection algorithm, scoring algorithm, and termination criterion. These components are briefly described below. An example of flowchart of CAT algorithm including the five components is shown in Figure 1.

Figure 1. Example flowchart of CAT Algorithm

1. Calibrated item pool

For the CAT algorithm to work, an item pool with known parameters is needed. For this purpose, the items to be used in CAT applications are piloted to a group of test takers beforehand. Then, the parameters of these items are determined based on one of the Item Response Theory (IRT) models, and an item pool is created. A good item pool should contain a large number of items with varying difficulty levels (Weiss & Kingsbury, 1984). This reduces the likelihood of the same item being used multiple times in different applications and makes it possible to provide appropriate items for both high-achieving and low-achieving students.

2. Starting rule

There are different approaches for the rule regarding the selection of the first item to be presented The most common method is to start the test with an item of medium difficulty. In the absence of prior information about ability level, it is best to assume that the individual has medium ability (Mills & Stocking, 1996).

3. Item selection algorithm

In this stage, it is decided how the items will be selected from the pool. The method mostly used in item selection is the Maximum Fisher Information (MFI). In this method, the computer selects the item from the item pool that provides the highest information at the estimated temporary ability level of the individual.

4. Scoring algorithm

A temporary ability estimation is made after the individual's response to each item. Two main methods are used for this: Maximum Likelihood Estimation (MLE) and Bayesian methods (Weiss & Kingsbury, 1984). In the former one, the individual must give at least one correct and one incorrect answer for ability estimation (Thompson & Weiss, 2011). In Bayesian methods, ability estimation of the individual can start immediately after the response to the first item. The most commonly used Bayesian methods in CAT applications are Expected a Posteriori (EAP) and Maximum a Posteriori (MAP).

5. Termination criterion

Different termination criteria are used in CAT applications. The most effective and commonly used one is to terminate the test when a certain standard error value is reached in the ability estimation of individuals. The third and fourth components of item selection and scoring are repeated until the termination criterion is satisfied. Another termination criterion is to conclude the test when a fixed number of items is reached. Thus, each test taker answers the same number of items, but the items differ based on their ability level.

REFERENCES

Kalender, İ. (2009). Başarı ve yetenek kestirimlerinde yeni bir yaklaşım: Bilgisayar ortamında bireyselleştirilmiş testler (computerized adaptive tests-CAT). CITO Egitim Kuram ve Uygulama, 5, 39-48.

Mills, C. N. & Stocking, M. L. (1996). Practical issues in large-scale computerized adaptive testing. Applied measurement in education, 9(4), 287-304.

Thompson, N. A. & Weiss, D. J. (2011). A framework for the development of computerized adaptive tests. Practical Assessment, Research, and Evaluation, 16(1), 1-9.

Weiss, D. J. & Kingsbury, G. G. (1984). Application of computer adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361–375

Banu Karyağdı, Boğaziçi Üniversitesi, MS, 2022

05.01.2023