It’s a common mistake to assume that reliability and validity, as they relate to pre-employment tests, are essentially the same thing.
They aren’t. And if you’re shopping around for a hiring assessment, it’s important to understand both what both concepts mean, why they’re so important, and how they differ.
What is Reliability?
Of the two terms, assessment reliability is the simpler concept to explain and understand.
Here’s a good definition of reliability in a research context: if an assessment is reliable, the results will be very similar no matter when someone takes the test. If the results are inconsistent, the test is not considered reliable.
So, if you’re focusing on the reliability of a test, the question to ask is: are the results of the test consistent? If someone takes the test today, a week from now, and a month from now, will their results be the same?
To determine the reliability of their tests, assessment companies pay close attention to two aspects of reliability in particular: re-test reliability and internal consistency measures.
Find out why science-based hiring assessments are more helpful at identifying candidates’ potential than resumes, referrals, and interviews here.
Test Re-Test Reliability
To confirm a test’s reliability, assessment companies determine consistency over time with test-retest reliability. With this type, the same group of people is given the test twice (a few days or weeks apart) in order to spot differences in results.
Researchers then measure the correlation coefficient—a statistical measure ranging on a scale from 0, no correlation, to 1, perfect correlation, to assess the reliability of the test. Since no test is going to be completely error-free, the correlation needs to be 0.7 or higher to be considered reliable.
Internal consistency focuses elsewhere to confirm that, yes, test items that are intended to be related are truly related.
Assessment companies typically measure internal consistency by correlating scores on the first half of the test to those on the second half. Since these scores should be measuring the same thing, the correlation should be 0.7 or higher. For example, if part of a pre-employment assessment is designed to measure math skills, test-takers should score equally as well on the first and second halves of that part of the test.
What is Validity?
A validity definition is a bit more complex because it’s more difficult to assess than reliability.
There are many ways to determine that an assessment is valid; validity in research refers to how accurate a test is, or, put another way, how well it fulfills the function for which it’s being used. In pre-employment assessments, this means predicting the performance of employees or identifying top talent.
There are several ways for assessment companies to measure types of validity within tests, including content, criterion-related, and construct validity.
An assessment is said to have content validity when the criteria it’s measuring aligns with and adequately covers the content of the job. Also, the extent to which that content corresponds with success on the job is part of the process in determining how well the assessment demonstrates content validity.
Here’s an example: a fast typing speed would likely be considered a key part of the job for an executive secretary, but not for an executive. While the executive is probably required to type sometimes, this skill is not as nearly as important to performing that job as it would be for the executive secretary. Ensuring that an assessment demonstrates content validity means judging the degree to which test items and job content match each other.
An assessment demonstrates criterion-related validity if the results of the assessment are predictive of a function that’s related to job performance.
So how can we tell if an assessment predicts performance? Assessment scores must be statistically evaluated against a measure of employee performance. For example, an employer interested in understanding how well a personality test identifies individuals that are likely to engage in counterproductive work behaviors might compare applicants’ personality test scores to how many accidents or injuries those individuals have on the job, if they engage in on-the-job drug use, or how many times they ignore company policies.
The degree to which the assessment results are related to a measure of performance—like counterproductive work behaviors—is the extent to which it exhibits criterion-related validity.
An assessment demonstrates construct validity if it is related to other assessments measuring the same psychological construct—a construct being a concept used to explain behavior. For example, cognitive ability is a construct that’s used to explain a person’s capacity to understand and solve problems.
To measure construct validity, an assessment company would statistically compare an assessment to similar tests that, in theory, it should be related to since they are measuring the same thing. There really shouldn’t be a significant relationship between a test that is measuring personality and one that is measuring cognitive ability, because they’re measuring two different constructs. However, the test that is measuring personality should be strongly correlated with other tests measuring personality.
Can a Test Be Valid but Not Reliable?
As you’d expect, a test cannot be valid unless it’s reliable. However, a test can be reliable without being valid.
Let’s unpack this, as it’s common to mix these ideas up.
If you’re providing a personality test and get the same results from potential hires after testing them twice, you’ve got yourself a reliable test. However, if the personality test isn’t actually measuring the personality traits it claims to, and instead corresponds with an unrelated assessment such as on-the-job skills, this assessment probably isn’t valid.
Tips to Ensure Your Test is Reliable and Valid
To make sure the pre-employment assessment you choose is both reliable and valid, check that the vendor focused on creating a valid test in the earliest phases of development.
Your use of the assessment should always be tied back to a tangible job outcome, objective problem, or measurable personality trait. The industrial-organizational scientists involved with the product should have conducted thorough research and consulted subject matter experts in your field to review test questions and ensure they’re designed for what they’re intended to measure. Additionally, the sample population used for test development should be appropriately representative of the population as a whole that may use the assessment. (For example, you wouldn’t want to test homogenous populations or a small sample.)
Next, you can check guardrails for reliability that your potential vendors have put in place by asking:
- “Does the assessment use clear, easy-to-understand language with a variety of questions to measure each category?”
- “Did the industrial-organizational researchers review the test items for bias?”
- “What was the sample population used for developing and validating the test?”
The assessment you choose should also come with detailed instructions that decrease any variations in testing conditions as much as possible, from time given for test-taking to noise levels in the testing environment.
In understanding the nuances of reliability vs validity, you’ll see that both distinct concepts are necessary for the success of every test you use. From cognitive ability tests to personality tests to emotional intelligence tests, any pre-employment assessment needs to measure what it intends to measure and produce consistent results over time to be useful to you and your company.
Developed with insights from I-O Psychology, Artificial Intelligence, and machine learning, Wonderlic’s one-of-a-kind WonScore assessment provides reliable and valid testing for your organization, so you can be confident that each hire is a perfect fit for the role. Schedule your free demo today to learn how Wonderlic helps industry leaders like you hire the best talent.