This is the third in a series of posts that explore the rigorous process of building NABP’s examinations. Here, we explain how tests are developed.
A previous blog post provided a primer on test validity; however, much of the validity evidence is collected throughout the test development process. Although there are several frameworks for principled assessment design systems, such as evidenced-centered design (eg, Mislevy & Haertel, 2007), assessment engineering (eg, Leucht, 2013), and cognitive design systems (eg, Embreston, 1998), all test development systems share the same basic structure.
Figure Out What You Want to Measure
The first step is to define the construct to be measured and provide a clear statement about the knowledge, skills, and abilities (KSAs) required of examinees. For example, the NAPLEX is designed to evaluate general practice knowledge and is taken by college of pharmacy graduates shortly after they receive their degree. This statement describes the domain being measured (ie, general practice knowledge) as well as the target population (ie, college of pharmacy graduates).
Now that a construct has been defined, it is necessary to develop the content specifications. A practice analysis is typically conducted to determine the specific tasks, or competency statements, and test blueprint weighting that support the interpretation and use claims of the exam. The competency statements describe the specific KSAs being measured, and the blueprint weights describe the proportion of the test devoted to each KSA. There are numerous methods for conducting a practice analysis, but the most common involves asking subject matter experts (SMEs) to determine a list of competency statements and then conducting a survey to determine how important, frequent, and critical each task is to safe and effective practice. You can view our NAPLEX competency statements and test blueprint as an example.
Now Figure Out How to Measure It
The competency statements and blueprint weights are used to guide the item development process. The types of items to be developed depend on the construct being measured and the interpretation and use claims of the exam. While multiple-choice questions are the most common, constructed-response essays or computer simulations may be appropriate, depending on the construct. SMEs are then recruited to create test items designed to map to the competency statements and associated blueprint weights so that tests can be constructed to match the content specifications.
Test form construction often depends on technical considerations in addition to the content specifications. Issues such as equating to maintain score comparability over time, creating pretest item blocks to verify the integrity of new items, and the distribution of item difficulties across the score scale are typically determined well in advance of test form construction. For example, the NAPLEX is a pass/fail exam, so the distribution of item difficulties is centered around the cut score. Item writers know this in advance and write items to limit the number of extremely easy or hard items.
Once forms are constructed, they are administered to examinees depending on the purpose of the exam. High-stakes exams are often administered in secure test centers, while lower-stakes exams may be administered with remote proctors or no proctor at all. The exams are scored based on previously determined rules and reported to the appropriate entities such as candidates, state boards, schools, etc. Candidate responses to the pretest items are analyzed and those that perform well are fed back into the cycle to replenish the items that have been depreciated due to outdated content, poor statistical qualities, or overexposure.
On Its Face, Constructing a Test is Deceptively Simple
Developing a test might seem easy, yet complexities abound. All stages of the test development cycle are integral for collecting validity evidence to support the interpretation and use claims of the exam. Furthermore, both conceptual and technical issues must be considered throughout the process to ensure that the exam that is constructed is the best measurement instrument for the construct.