Applying Principles of Evidence-Based Medicine to Behavior Progress Monitoring

In February 2018, my doctoral student Rachel L. Kininger presented the results of an instrument development project at the National Association of School Psychologists conference in Chicago, IL. In this project, we developed and then assessed the feasibility and accuracy of a tablet-based progress monitoring tool for behavior interventions. Funding for this project was provided by an Early Career Research Award from the Society for the Study of School Psychology that I was awarded in 2015. Work on the iPad application was completed by the Game Research and Immersive Design Lab at Ohio University, and the project was mentored by my friend and colleague Steven W. Evans, Ph.D.


Student behavior concerns vary widely, making intervention response difficult to define. Practitioners commonly rely on ad hoc measures (e.g., count of classroom rule violations), but it is often unclear how to use these data to inform intervention decisions (Saeki et al., 2011). Moreover, recommended progress metrics, such as effect sizes or reliable change indices, can lead to vastly different conclusions (Cheney, Flower, & Templeton, 2008).

Evidence-based medicine (EBM) is an approach to clinical decision-making that combines data to produce a probability estimate of an event of interest (e.g., need for treatment). With each new datum, a likelihood estimate is computed using Bayes’ Theorem, based on: (a) the prior likelihood of the event; (b) the sensitivity of the measure; and (c) the specificity of the measure. Intervention changes are then triggered at predetermined thresholds (Youngstrom, 2013). Research suggests that even brief training in EBM can improve psychologists’ decisions when compared to clinical judgment alone (Jenkins et al., 2011).

The purpose of this study was to develop and conduct an initial evaluation of a tool that can arm school psychologists with this sophisticated data analytic technique during behavior consultation with teachers.


Items. Candidate items for the progress monitoring tool were culled from three open-access teacher rating scales: (a) the Classroom Performance Survey (CPS; Brady et al., 2012); (b) the Disruptive Behavior Disorders scale (DBD; Pelham et al., 1992); and (c) the Impairment Rating Scale (IRS; Fabiano et al., 2006). We assessed the sensitivity and specificity of all 57 items when detecting nonresponse to the Challenging Horizons Program (CHP; Evans et al., 2016), an after-school program for ADHD (n = 110). We defined nonresponse as end-of-year GPA below 1.5, despite the intensive CHP treatment.

Procedures. We recruited 12 middle school students with ADHD (33% African-American; 33% Hispanic; mean family income = $26,900). Each participant was matched with a teacher consultee for yearlong classroom intervention. We administered the instrument four times over the course of the spring semester (roughly 1-month intervals). We assessed test-retest reliability, criterion validity, and the accuracy of responder/nonresponder classification. For the latter two analyses, we compared our results to teacher ratings on the Behavior Assessment System for Children, Third Edition (BASC-3; Reynolds & Kamphaus, 2015).


Item Properties. Analysis of the CHP data was completed and all item sensitivity and specificity estimates were calculated and sorted according to the positive likelihood ratio (see Table 1). Feasibility. We ran into several unanticipated challenges during initial launch (e.g., unreliable Wi-Fi connections in the targeted public schools) and developed workarounds to those problems (e.g., local caching). In the interim, we used paper-pencil versions of the scale and scored it using an Excel spreadsheet. Feasibility was difficult to assess because app development was delayed, but the return rate of 95.8% suggests that teachers found the paper-pencil version acceptable. Validity/Reliability. Test-retest reliability of the measure was .89, .85, and .90 (Kendall’s tau-b) from occasions one-to-two, two-to-three, and three-to-four, respectively. Criterion-related validity was assessed relative to the School Problems subscale of the BASC-3 at end-of-treatment. Those estimates were .58, .73, .68, and .76 (Kendall’s tau-b) across the four measurement occasions, respectively. In all instances the reliability and validity estimates were significant (ps < .05). Classification. We noted that the scale accurately classified 75%, 82%, 92%, and 100% of the sample across the four measurement occasions, respectively, according to the at-risk designation (T ≥ 60) of the BASC-3 School Problems subscale at end-of-study. The first assessment occurred roughly three months prior to the end-of-treatment, which suggests that the scale can provide an early warning of intervention nonresponse, allowing ample time to make adjustments.


Perhaps not surprisingly, items pertaining to oppositional defiant disorder (e.g., student often loses temper) generally proved most predictive of intervention nonresponse. Teachers who describe students in this manner clearly perceive them as challenging. The advantage of the scale, however, is to quantify what these reports mean in terms of likely response or nonresponse to intervention.

In short, the scale appears to provide a measure of distal outcomes, whereas intervention-specific measures (e.g., count of classroom rule violations) signal proximal outcomes. The distal data provided by the scale can provide consistent, comparable measures over time, even when consultees fail to collect proximal measures (as often occurs). More research is needed to establish the accuracy of the scale, but our pilot study suggests that the use of EBM principles can improve intervention decision making.


•Our findings suggest that the EBM-based scale holds promise as a progress monitoring tool that can correctly classify students as intervention “responders” and “nonresponders” months in advance.

•The Bayesian algorithm provides an interpretable estimate of confidence that could readily inform response-to-intervention decisions.

•Tablet-based applications can prove problematic in schools due to poor Wi-Fi connections, requiring local caching and other workarounds.


Brady, C. E., Evans, S. W., Berlin, K. S., Bunford, N., & Kern, L. (2012). Evaluating school impairment with adolescents using the classroom performance survey. School Psychology Review, 41 (4), 429-446.

Cheney, D., Flower, A., & Templeton, T. (2008). Applying response to intervention metrics in the social domain for students at risk of developing emotional or behavioral disorders. Journal of Special Education, 42, 108-126. doi: 10.1177/0022466907313349

Evans, S.W., Langberg, J.M., Schultz, B.K., Vaughn, A., Altaye, M., Marshall, S.A. & Zoromski, A.K., (2016). Evaluation of a school-based treatment program for young adolescents with ADHD. Journal of Consulting and Clinical Psychology, 84, 15-30.

Fabiano, G.A., Pelham, W.E., Waschbusch, D.A., Gnagy, E.M., Lahey, B., Chronis, A.M., et al. (2006). A practical measure of impairment: Psychometric properties of the impairment rating scale in samples of children with attention deficit hyperactivity disorder and two school-based samples. Journal of Clinical Child and Adolescent Psychology, 35, 369-385.

Jenkins, M. M., Youngstrom, E. A., Washburn, J. J., & Youngstrom, J. K. (2011). Evidence-based strategies improve assessment of pediatric bipolar disorder by community practitioners. Professional Psychology: Research and Practice, 42, 121–129. doi:10.1037/a0022506

Pelham, W.E., Gnagy, E.M., Greenslade, K.E., & Milich, R. (1992). Teacher ratings of DSM-III symptoms for the disruptive behavior disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 31, 210-218.

Reynolds, C.R., & Kamphaus, R.W. (2015). Behavior Assessment System for Children Manual (3rd ed.). Bloomington, MN: Pearson.

Saeki, E., Jimerson, S. R., Earhart, J., Hart, S. R., Renshaw, T., Singh, R. D., & Stewart, K. (2011). Response to intervention (RtI) in the social, emotional, and behavior domains: Current challenges and emerging possibilities. Contemporary School Psychology, 15, 43-52.

Youngstrom, E. A. (2013). Future directions in psychological assessment: Combining evidence-based medicine innovations with psychology’s historical strengths to enhance utility. Journal of Clinical Child and Adolescent Psychology, 42, 139-159. doi: 10.1080/15374416.2912.736358

Featured Posts
Recent Posts
Search By Tags
Follow Us
  • Facebook Classic
  • Twitter Classic
  • Google Classic

© 2015 by Brandon Schultz. Proudly created with