Developing the content
Test content is developed through the input of technical experts.
Traditionally called “focus groups”, these groups have transitioned into
groups now called “Subject Matter Experts” or SME’s. To develop any
national test, a minimum of five and a maximum of nine technical experts
from three states are essential. Ideally, the more participation from
more states is desired and sought.
A work session for the
SME’s at the developmental stage generally consists of a “job and task”
analysis of the work done within the field of occupation that the SME’s
represent. A job and task analysis attempts to identify elements of the
occupation that can be identified as individual jobs. It continues to
identify tasks that need to be completed within each job element. There
may be many jobs within an occupation and subsequently many tasks that
need to be done before a job can be completed. This is the first step in
developing “Content Validity.”
Depending on the scope
of development, SME’s may be involved in identifying other elements of
the occupations, such as: tools, equipment, work environment, conditions
of work that relate to ADA (America requirements, etc. Sometimes this
information is already sufficiently documented and doesn’t require
additional study.
Developing the
structure
After the job and task analysis has been completed, the same or another
group of SME’s is retained to develop the structure of the test. The
structure of a test is sometimes referred to as the “table of test
specifications” and is used to determine the content and emphasis of the
test. This is the part of the test building process that establishes
“Construct Validity.”
It is generally
understood that a group of questions (items) may constitute a test of
knowledge in a given field of occupation. Developing the table of test
specification maintains a degree of control on the content of the test.
This process also satisfies the condition of “Semantic Validity” whereby
the labels relate to the occupation being evaluated. This control of
content helps to insure that one aspect of validity is maintained, the
second step for “Content Validity.”
The structure or table
of test specification identifies the number of questions/items to be
applied to a given section/category of the test. Doing so maintains the
content relationship of the test while individual questions, relating to
the section/category, may be changed or randomly selected from a test
item bank.
Developing the items
Questions on a test are typically referred to as “test items.” Each item
takes the form of a multiple-choice question. The item is made up of
three parts: 1) the question, called the “stem”, a single correct
answer, and a set of plausible, possible answers called “distracters.”
The number of distracters sometimes helps to elevate the difficulty or
level of the test. Typically there are four choices, one correct and
three plausible distracters.
There can be up to
seven choices for a given item. The same or a similar occupational group
of SME’s are involved with the development of test items. Items are
generated in many different ways, and they are guided by the “table of
test specifications.” More items are generated than the test requires.
Often, items are abandoned for many reasons and additional items are
required to maintain the size or length of a test.
Items are reviewed for
“Bias” toward protected groups by persons other than the SME’s. The test
developer selects individuals that have a high degree of sensitivity to
bias language to review the developed test items in an attempt to
eliminate all or most language that may offend or bias an item for a
protected group of people.
Pilot Testing
Pilot testing is an important step in the development of a test. Pilot
testing consists of identifying individuals within an occupation that
are at approximately the target level of the test. For example, a
technician level test may require selecting individuals who have some
level of experience within the occupation to pilot the test. Pilot
participants are selected by knowledgeable people within the occupation.
They are asked to select participants who they feel are at the specific
level the test is designed.
Through this selection
process come aspect of “Criterion Related Validity” is generated.
Typically, selected participants have already been evaluated to some
degree by the person who selects them. Therefore, some degree of
relationship exists between the level of the participant and the level
of the test. Other criterion related tests results might be used to
validate this process for a pilot group.
The size of the pilot
test group will be selected to generate sufficient data. Depending on
the number of people within the occupation, the number within a
location, or the number of individuals that can be found to volunteer,
will determine the pilot test group. “Face Validity” is generated at
this point in the test development process. Face Validity refers to the
recognition of the test title, test categories, and test items as being
part of the field of occupation. Individuals who pilot test are asked to
respond or comment on each of these parts of the test.
The pilot test also
asks participants to mark items, words or phrases that might have an
impact on protected groups in a second attempt to eliminate bias.
Item Analysis
Item analysis is the process of technically reviewing the structure,
response, and fit of a given item and the relationship of that item to
the rest of the test. Through item analysis some levels of “Reliability”
validation can be obtained. Item analysis can use any or all of the
following statistical analysis:
-
Kuder Richardson
Formula - KR 20 & KR 21
-
Cronbach
Coefficient Alpha
-
Spilt-half
Reliability Coefficient
-
Level of
Difficulty
-
Easiness Scale
-
Coefficient of
equivalence
-
Spearman-Brown
-
Standard error of
measurement
Final Formatting
The item analysis will reveal items that are not working as expected.
Those items will either be eliminated or the deficiency repaired. The
test will be formatted to the correct number of total questions and each
section/category is reviewed for the correct number of questions
according to the table of test specifications. All other spelling and
formatting difficulties are eliminated.
Test Delivery
Delivery of the test follows prescripts required for all national tests
and requires a level of security. The delivery process consists of
identifying individuals to proctor the test who have a high degree of
personal conviction and agree to the requirements of handling and
proctoring a national test.
Continuing Analysis
Test results are continually monitored on a periodic basis. As score
anomalies occur, whole tests or individual items are scrutinized for
problems. When a test shows a significant level of problems it is slated
for review ahead of its scheduled review period. All tests are reviewed
on a three year basis.