Practical Applications Adaptive Testing

The origins of adaptive testing can be traced back to the work of Binet and Simon in 20th century, when they developed an intelligence test using a pool of items (van Der Linden & Glas, 2010). Their approach laid the foundation for adaptive testing by introducing key components such as a pre-calibrated item bank, a starting rule, a defined scoring method, and a predetermined termination rule. In their test, the examinee’s performance on a previous set of items determined the difficulty of the subsequent questions. If an examinee answered most or all of the questions correctly at their age-appropriate level, they were given items from the next higher age level. Conversely, if they struggled with the majority of questions at a given level, they were given easier items from the next lower age level. Through this pioneering study, Binet and Simon established the fundamental principles of adaptive testing in the early 20th century.

In terms of theory, Item Response Theory (IRT) has been shown to significantly enhance the quality and efficiency of adaptive testing (Weiss, 1982). IRT enables more precise and effective test adaptation, leading to greater accuracy in estimating a test taker's ability level compared to earlier methods (Kim et al., 2015; van Der Linden & Glas, 2010; Weiss, 1982). In contrast, the early adoption of adaptive testing for large-scale assessments was limited by insufficient computing power and algorithmic capabilities. However, the rise of the Internet and networked computer systems in the 1990s made it feasible to deliver adaptive tests to a large number of candidates efficiently. As a result, the use of adaptive testing in both assessment and learning has grown significantly over time. With research highlighting its advantages, adaptive testing has been increasingly integrated into diverse fields such as language proficiency and military recruitment.

Here are some pioneering applications in adaptive testing:

The National Council Licensure Examination (NCLEX)

The National Council Licensure Examination has been a nationwide examination for licensing nurses in the United States since 1982, and it was adopted in Canada in 2015 and Australia in 2020.
Upon completion of their nursing education, students must pass the NCLEX exam to acquire a nursing license. This credential authorizes them to practice nursing within the state where they have fulfilled all necessary criteria.
The National Council of State Boards of Nursing, Inc. (NCSBN) transitioned the NCLEX to a computerized adaptive testing (CAT) format in 1994 (NCSBN Historical Timeline | NCSBN, n.d.)

Armed Services Vocational Aptitude Battery (ASVAB)

The ASVAB adopted an adaptive format, called the CAT-ASVAB, in 1996. This version is used for military enlistment and helps assign recruits to roles that match their abilities.
The CAT-ASVAB is still administered by the U.S. Department of Defense and remains an essential instrument for military recruitment.

Here are some examples of well-known educational assessments employing adaptive testing:

Graduate Record Examination (GRE)

The GRE program offers a variety of assessments designed to assist U.S. graduate programs in selecting students. These include the GRE General Test, which evaluates broad abilities, as well as a series of Subject Tests that assess knowledge in specific fields such as engineering and psychology (Mills & Steffen, 2000).
The GRE General Test began using adaptive testing in 1993, initially adopting a computer-adaptive test (CAT) format for the entire exam (Mills & Steffen, 2000). In 2011, the exam shifted to a multi-stage adaptive testing (MST) format, where the difficulty of test sections is adjusted based on performance in preceding sections. As a result, the GRE became section-adaptive: only the two scored Math sections and two scored Verbal sections are adaptive, while the Analytical Writing section and unscored Experimental/Research sections remain non-adaptive (Woodbury-Stewart et al., 2023).

Graduate Management Admission Test (GMAT)

The GMAT is a crucial assessment test used for admission to business schools, including MBA and other graduate management programs.
It has three main components, the Analytical Writing Assessment (AWA), the Quantitative section, and the Verbal section.
It introduced adaptive testing in 1997 (Rudner, 2009).

Test of English as a Foreign Language (TOEFL)

The TOEFL is a widely recognized English proficiency examination for non-native speakers.
The TOEFL iBT does not follow a fully adaptive format in the traditional sense, but it does incorporate adaptive elements within the reading and listening sections (Alderson, 2009). The reading and listening sections are divided into subsections, with varying question difficulty based on the test-taker’s performance in earlier subsections.

Duolingo English Test (DET)

The Duolingo English Test, an adaptive language proficiency test, was launched in 2016. It adjusts the question difficulty in real-time based on the test takers’ performance and has gained popularity as an accessible, low-cost option for university admissions and job applications.
The "Duolingo" mobile application also utilizes adaptive release to require successful content completion before users can “level up” (Teske, 2017).

Programme for International Student Assessment (PISA)

PISA is an international assessment that measures 15-year-olds' ability to use their reading, mathematics, and science knowledge to meet real-life challenges (OECD, 2018).
PISA incorporated multi-stage adaptive testing into its reading assessment in 2018 and further expanded this approach to mathematics in PISA 2022 (OECD, 2023).

Scholastic Assessment Test (SAT)

The SAT, a standardized test widely used for college admissions in the United States, has employed a multistage adaptive design since 2023.
Each test section (Reading and Writing, and Math) is divided into two equal-length, separately timed parts called modules. Test takers first answer a set of questions in the initial module, and the questions they receive in the second module depend on their performance in the first (“What Is Digital SAT Adaptive Testing?”, 2023).

Additionally, several studies have explored the use of adaptive testing in Turkey (Aybek & Çıkrıkçı, 2018; Bulut & Kan, 2021; Çıkrıkçı et al., 2020; Demir & Atar, 2021; Kalender & Berberoğlu, 2017; Şimşek & Tavşancıl, 2022). Bulut and Kan (2012) illustrated how CAT could be employed for Turkey’s Graduate Entrance Examination, showing that it provides accurate ability estimates with fewer items compared to traditional paper-and-pencil tests. Likewise, Kalender and Berberoğlu (2017) found that the CAT version of Turkey’s university admission subtests offers a viable alternative to conventional testing approaches. In another study, the CAT version of the Turkish Driver’s License Exam effectively differentiated between candidates in terms of their theoretical driving knowledge, providing a reliable foundation for accurate assessment (Çıkrıkçı et al., 2020). More recently, BounAdaptiveTestLab developed a computerized adaptive test to assess the mathematical abilities of 4th-grade students. A demo of the test can be accessed here: DemoCat

In conclusion, research on adaptive testing methods like CAT and MST continues to expand their applicability across diverse fields. It is essential for researchers and policymakers to stay informed of these advancements and explore opportunities to integrate adaptive testing into various assessment contexts.

REFERENCES

Alderson, J. C. (2009). Test review: Test of English as a Foreign Language TM: Internet-based Test (TOEFL iBT®). Language Testing, 26(4), 621-631. https://doi.org/10.1177/0265532209346371

Aybek, E. C., & Çıkrıkçı, R. N. (2018). Kendini Değerlendirme Envanteri’nin bilgisayar ortamında bireye uyarlanmış test olarak uygulanabilirliği. DergiPark (Istanbul University). https://dergipark.org.tr/tr/pub/tpdrd/issue/40299/481364

Binet, A., & Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. In H. H. Goddard (Ed.), Development of intelligence in children (the Binet-Simon Scale). Baltimore: Williams & Wilkins.

Bulut, O., & Kan, A. (2012) Application of computerized adaptive testing to entrance examination for graduate studies in Turkey. Egitim Arastirmalari-Eurasian Journal of Educational Research, 49, 61-80.

Burr, S. A., Gale, T., Kisielewska, J., Millin, P., Pêgo, J. M., Pinter, G., Robinson, I. M., & Zahra, D. (2023). A narrative review of adaptive testing and its application to medical education. MedEdPublish, 13, 221. https://doi.org/10.12688/mep.19844.1

Cikrikci, N., Yalçin, S., Kalender, İ., Gül, E., Ayan, C., Uyumaz, G., Kürşad, M. Ş., & Kamis, O. (2020). Development of a computerized adaptive version of the Turkish Driving Licence Exam. International Journal of Assessment Tools in Education, 7(4), 570–587. https://doi.org/10.21449/ijate.71617

Demir, S. & Atar, B. (2021). Investigation of Classification Accuracy, Test Length, and Measurement Precision at Computerized Adaptive Classification Tests. Journal of Measurement and Evaluation in Education and Psychology, 12(1), 15-27. doi: 10.21031/epod.787865

Kalender, I., & Berberoglu, G. (2017). Can computerized adaptive testing work in students’ admission to higher education programs in Turkey? Educational Sciences: Theory & Practice, 17, 573–596. http://dx.doi.org/10.12738/estp.2017.2.0280

Kim, S., Moses, T., & Yoo, H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52(1), 70–79. https://doi.org/10.1111/jedm.12063

Koşan, A. M. A., Koç, N., Elhan, A. H., & Öztuna, D. (2020). Developing an item bank for progress tests and application of computerized adaptive testing by simulation in medical education. International Journal of Assessment Tools in Education, 6(4), 656–669. https://doi.org/10.21449/ijate.635675

Mills, C. N., & Steffen, M. (2000). The GRE Computer Adaptive Test: Operational issues. In Springer eBooks (pp. 75–99). https://doi.org/10.1007/0-306-47531-6_4

NCSBN Historical Timeline | NCSBN. (n.d.). https://web.archive.org/web/20150316220546/https://www.ncsbn.org/70.htm

OECD (2019), PISA 2018 Assessment and Analytical Framework, PISA, OECD Publishing, Paris, https://doi.org/10.1787/b25efab8-en.

OECD (2023), "Adaptive testing in PISA 2022", in PISA 2022 Results (Volume I): The State of Learning and Equity in Education, OECD Publishing, Paris, https://doi.org/10.1787/89c0f253-en.

Rudner, L.M. (2009). Implementing the Graduate Management Admission Test Computerized Adaptive Test. In: van der Linden, W., Glas, C. (eds) Elements of Adaptive Testing. Statistics for Social and Behavioral Sciences. Springer, New York, NY. https://doi.org/10.1007/978-0-387-85461-8_8

Şimşek, A. S. & Tavşancıl, E. (2022). Applicability and Efficiency of a Polytomous IRT-Based Computerized Adaptive Test for Measuring Psychological Traits. Journal of Measurement and Evaluation in Education and Psychology , 13 (4) , 328-344. DOI: 10.21031/epod.1148313

Teske, K. (2017). Duolingo. CALICO Journal, 34(3), 393–401. https://www.jstor.org/stable/90014704

van der Linden, W. J., & Glas, C. A. W. (2010). Elements of adaptive testing. (Statistics for Social Behavioral Sciences). Springer. https://doi.org/10.1007/978-0-387-85461-8

Weiss, D. J. (1982). Improving Measurement Quality and Efficiency with Adaptive Testing. Applied Psychological Measurement, 6(4), 473–492. https://doi.org/10.1177/014662168200600408

What is Digital SAT Adaptive Testing? (2023, August). What is Digital SAT Adaptive Testing? Retrieved July 30, 2024, from https://blog.collegeboard.org/what-digital-sat-adaptive-testing

Woodbury-Stewart, S., Woodbury-Stewart, S., & Woodbury-Stewart, S. (2023, April 30). Is the GRE Adaptive? | GRE Adaptive Scoring | TTP GRE Blog. TTP GRE Blog. https://gre.blog.targettestprep.com/is-the-gre-adaptive/#:~:text=The%20GRE%20is%20section%2Dadaptive.,research%20sections%20are%20not%20adaptive

Belgin Eriz, Boğaziçi University, MSc

26.09.2024