`Most research texts indicate that reliability coefficients of .70 are sufficient for research. According to Kaplan and Saccuzzo (2018), Cronbach’s alphas greater than .80 are very good, and Creswell (2003) proclaimed that anything above .90 is very high. Only two scales in the online SCALE were below .70 and all the composite scales were very good, indicating adequate internal reliability for practical use in the ways for which it was designed.
You may have noticed that we also included in the last column of Table 2 additional Cronbach’s α reliability statistics from a much larger sample (Nelson et al., 2004). While the larger sample yielded higher reliability alphas on four scales, the more recent and much smaller sample yielded higher alphas on nine scales. One reason for the surprising performance with a much smaller sample may be that the automatic computer scoring of the online ESAP reduced testing error introduced by human scoring of the paper and pencil instrument in the larger sample.
Evidence of instrument validity is provided by researchers to describe the construct(s) measured by the instrument. Creswell (2009) described the three primary kinds of instrument validity as (a) content validity (the amount of a particular constitute contained), (b) concurrent validity (the amount and strength the measure correlates with another measure with established content), and (c) construct validity (the extent to which the survey measures hypothetical constructs). Kaplan and Saccuzzo (2018), however, reminded us that as of the most recent edition of Standards for Educational and Psychological Testing (2014) validity is no longer recognized in this way but rather it recognizes evidence for validity. Still, it can be instructive to talk about these long held validity categories when presenting evidence.
Instrument validity and reliability are related constructs. As shared by Kaplan and Saccuzzo (2018), attempting to provide evidence of test validity without reliability would be pointless because an unreliable test cannot logically be valid. Accordingly, for a test to be reliable, it should correlate more highly with itself than with any other test (Kaplan & Saccuzzo, 2018). Also, and with an eye toward validity, when constructs correlate very strongly from two different tests, then they are essentially measuring the same thing (Epstein, 2012). Before we can understand what SCALE® measures in terms of its relationship to ESAP®, what the ESAP® measures must first be established.
The ESAP® was first published in1998 and normed with first-year college students (N = 1,398) who participated in a Title V intervention program (Nelson & Low, 2004). Reliability and validity statistics were reported initially by Nelson et al. (2004) based on several studies with different populations. Using those PERL skills data, the EI intervention program that used the ESAP® was found to be significantly correlated with student achievement (Lu, 2008; Vela, 2003) and had statistically significant and qualitatively positive impacts for the students who participated (Potter, 2005). Since then, ESAP® validity has also been reported by others including Cox and Nelson (2008), Dockrat (2012), Farnia (2012), Hammett et al. (2012), Justice et al. (2012), and Tang et al. (2010). Following is a summary of those findings.1K Club