What is Multi-Stage Testing (MST)?

In Multi-Stage Testing (MST), individuals are presented with item sets containing items with similar characteristics, rather than individual items. These are preassembled groups of items to be administered in stages. Therefore, multi-stage testing is defined as a hybrid model of linear testing and item-level computerized adaptive tests (CATs) (Yan, Lewis & von Davier, 2014). MST consists of two or more stages, and each stage has groups of items, called modules, at different difficulty levels (Yan et al., 2014). Each individual is assigned a module (routing module) consisting of items of medium difficulty in the first stage. The individual's performance is determined according to the correct and incorrect responses to the items in the routing module, and according to this performance, the individual is routed to the module consisting of more difficult or easier items in the second stage. If there are more than two stages in the test, the individual is routed to a new module in the third stage in accordance with their performance on the second stage. In multi-stage testing models, the number of stages, the number of modules in each stage, and the number of items included in the modules may vary, and each is preassembled (Yan et al., 2014; Zenisky et al., 2010).

An example of the MST model is visualized in Figure 1. Initially, each individual takes a module of medium difficulty items in Stage 1, which is known as the routing module. Subsequently, based on their performance in Module 1, the individual is routed to one of the easy (Module 2), medium (Module 3), or hard (Module 4) modules in Stage 2. The same procedure applies when routing from Stage 2 to Stage 3. In a three-stage panel design, the individual is routed twice. At the end of Stage 3, the individual's ability is estimated, and the test is terminated based on the individual's performance on the path through the panel (Zenisky et al., 2010). Routing is the adaptation phase of MSTs to the individual.

What are the advantages of multi-stage tests over traditional (linear) tests and CAT?

Traditional (linear) tests require a large number of items for precise ability estimation. In general, these tests can measure individuals at intermediate ability levels with lower standard errors (Yan et al., 2014). CAT, on the other hand, can make more precise measurements than linear tests, and the measurement error is small and equal for all ability levels. In CAT, items are automatically drawn from the item pool one by one, so it is not predetermined which items the individual will encounter. This may make it difficult to balance the content and subject (Yan et al., 2014; Sarı et al., 2016). MST eliminates this limitation of CAT (Yan et al., 2014; Sarı et al., 2016). In multi-stage tests, it is possible to know in advance the possible paths that the individual will follow throughout the test and the possible final tests that the individual will eventually complete since the whole test is preassembled (Yan et al., 2014; Sarı et al., 2016). Additionally, in computerized adaptive testing (CAT), estimating ability after each response and selecting a new item from the item pool based on this temporary estimated ability level creates a computational burden on the computer. MST reduces this burden and allows for more efficient testing since ability estimation is performed after each module in MST.

CAT requires individuals to respond to each item to proceed with the test. It is not possible to leave the items blank or go back to previous items since ability is estimated after each response. However, in multi-stage tests, individuals can go back to the items they answered in the module, check and change them, or leave them blank before moving from one stage to the next. These advantages play an important role in the widespread use of multi-stage tests, especially in large-scale and high-stakes exams (Wang, 2017).

References

Sari, H. İ., Yahsi-Sari, H., & Huggins-Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388-406.

Wang, K. (2017). A fair comparison of the performance of computerized adaptive testing and multistage adaptive testing. Michigan State University, Michigan.

Yan, D., Lewis, C., & von Davier, A. A. (2014). Overview of computerized multistage tests. In D. Yan, C. Lewis, & A. A. von Davier (Eds.), Computerized multistage testing: Theory and applications (pp. 3–20). New York: Chapman and Hall/CRC.

Zenisky, A., Hambleton, R. K., & Luecht, R. M. (2010). Multistage testing: Issues, designs, and research. In W. J. van der Linden & C. A. W. Glas (Ed.), Elements of adaptive testing (pp. 355-372). New York, NY: Springer New York.

Betül Fatma Yıldırım, Boğaziçi Üniversitesi, MS

21.03.2023