top of page

Utility of the Classroom Performance Survey as an Outcomes Measure

In May 2023, my students Jen Donelan (right) and Kaitlynn Carter (left) delivered a poster presentation at the annual convention of the Association for Psychological Science in Washington, DC. The archived data used for that presentation were from a study of the Challenging Horizons Program funded by research grants to Ohio University and Cincinnati Children’s Medical Center from the National Institute of Mental Health (R01MH082864, R01MH082865). The analyses were partly supported by an early career research grant to Dr. Brandon Schultz by the Society for the Study of School Psychology, and the results were originally described in an unpublished manuscript (provided below).


Academic enablers (AE)—homework completion, organization of school materials, on-task time, prosocial classroom behavior—underpin academic performance and are commonly targeted for intervention among children with ADHD (Volpe et al., 2006). But AE progress measures are typically idiographic, making it difficult to compare a student’s performance to normative expectations. Indeed, few normative measures of classroom performance are psychometrically established for use as progress measures (Beidas et al., 2015).

We examined the Classroom Performance Survey (CPS), a free teacher rating scale of AE, as a treatment outcome measure. The CPS has two subscales: (1) academic competence; and (2) interpersonal competence. Psychometrics of the CPS have been examined in two previous studies (Brady et al., 2012; Caldarella et al., 2017), but its utility as an outcome measure is unknown.


Teachers (n = 368) completed the CPS for 326 middle school students with ADHD as part of a trial of the Challenging Horizons Program (CHP). Participants were randomized to: (1) an after-school program (n = 112), (2) teacher mentoring (n = 110), or (3) a treatment-as-usual (TAU) control (n = 104). The CPS was collected at four measurement occasions over a single school year for all groups.

Test-retest reliability was estimated using two-way mixed-effects model intraclass correlations (ICC) (Shrout & Fleiss, 1979), focusing on single measures and absolute agreement over time (Koo & Li, 2016) in the TAU group. The CPS’s accuracy relative to three academic performance criteria was assessed via receiver operating characteristic (ROC) curves. Youden’s (1950) J scores were averaged to estimate criterion-based cutoff scores for each subscale. To estimate the amount of change needed to be “reliable” (Jacobson & Truax, 1991), we applied the RCI formula using the test-retest reliability estimates and observed pre-treatment standard deviation.


First, we calculated ICCs for all contiguous combinations of two, three, and four measurement occasions for the control group (free from treatment effects). All ICC combinations that included the first measurement occasion of the year (Oct/Nov) resulted in poor to moderate reliability on both subscales. But reliability estimates improved beginning with the second measurement occasion (Jan/Feb) for academic (ICCs ≥ .79) and interpersonal competence (ICCs ≥ .71) subscales, suggesting acceptable test-retest reliability from that point forward.

Next, we examined the criterion-related accuracy of the instrument using data from all CHP participants. Results of ROC analyses suggests that the CPS discriminates students with below average achievement from students with average or better achievement to a fair degree. But in the case of the interpersonal competence subscale predicting reading achievement, the discriminatory power is poor. The CPS strongly discriminated high- and low-functioning students on end-of-year grades—especially the academic competence subscale, which significantly outperformed the interpersonal competence subscale (p = .001).

Finally, we examined reliable change and found that difference scores of 10+ points on the academic competence and a change of 6+ points on interpersonal competence signified reliable change (RCI ≥ 1.96, p < .05). Evaluation of criterion-referenced cutoff score estimates suggested that risk of poor grades is significantly reduced when the academic competence subscale is less than 28 and when the interpersonal competence subscale is less than 10.


The CPS provides a stable progress measure of AE that can accurately and meaningfully track how well students respond to classroom intervention.

After the first grading period, the CPS is acceptably stable (ICCs > .72) across realistic intervention intervals (8 – 16 weeks). The CPS can meaningfully distinguish students at-risk of poor grades from those who are not. A change of 10 or more points on the academic competence subscale and a change of 6 or more points on the interpersonal competence subscale signify reliable change. The present findings offer an initial step toward establishing interpretation guidelines for the CPS when monitoring student progress.


Beidas, R.S., Stewart, R.E., Walsh, L., Lucas, S., Downey, M.M., Jackson, K., Fernandez, T., & Mandell, D.S. (2015). Free, brief, and validated: Standardized instruments for low-resource mental health settings. Cognitive and Behavioral Practice, 22, 5-19.

Brady, C.E., Evans, S.W., Berlin, K.S., Bunford, N., & Kern, L. (2012). Evaluating school impairment with adolescents using the classroom performance survey. School Psychology Review, 41, 429-446.

Caldarella, P., Larsen, R.A., Williams, L., Wehby, J.H., Wills, H., & Kamps, D. (2017). Monitoring academic and social skills in elementary school: A psychometric evaluation of the classroom performance survey. Journal of Positive Behavior Interventions, 19, 78-89.

Jacobson, N.S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12-19.

Koo, T.K., & Li, M.Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine, 15, 155-163.

Shrout, P.E., & Fleiss J.L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428.

Volpe, R.J., DuPaul, G.J., DiPerna, J.C., Jitendra, A.K., Lutz, J.G., & Tresco, K. (2006). Attention deficit hyperactivity disorder and scholastic achievement: A model of mediation via academic enablers. School Psychology Review, 35, 47-61.

Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.

Utility of the Classroom Performance Survey as an Outcomes Measure
Download PDF • 511KB

Download PDF • 1.10MB


Featured Posts
Recent Posts
Search By Tags
Follow Us
  • Facebook Classic
  • Twitter Classic
  • Google Classic
bottom of page