Friday, 22 April 2016

Is it time to call time on Hattie and the significance of the 0.4 effect size?

If you are an evidence-based practitioner, school-research lead or headteacher and interested in being able to interpret effect sizes, then this post is for you.  In this post I will: first, briefly show how effect sizes are calculated; second, identify some of the most common interpretations of the size of effect sizes; third, summarise Robert Slavin's recent post on what do we mean by a large effect size; and finally, consider the implications of the discussion for those interested in supporting evidence-based practice within schools and the work of John Hattie

What do we mean by effect size?

Put quite simply, effect size is a way of quantifying the difference between two groups and is calculated using the following calculation.

Effect Size = (Mean of experimental group) - (Mean of the control group)
                                                Standard Deviation

How can we interpret effect sizes?

The most well known interpretation has been put forward by John Hattie in Visible Learning who having reviewed over 800 meta-studies argues that the average effect size of a range is educational strategies being 0.4.  On the basis of this, Hattie argues that teachers should select those strategies that have an above average effect on pupil and student outcomes.

Secondly, we could turn to EEF’s DIY Evaluation Guide written by Rob Coe and Stuart Kime, where on pages 17 and 18 they provide some guidance on the interpretation of effect sizes (-0.01 to 0.18 low, 0.19 - 0.44 moderate, 0.45 - 0.69, high, 0.7 + very high) and with effect sizes being converted into months of progress.

Alternatively, if you are interested in the relationship between effect sizes and GCSE grades, you could turn to Coe (2002) where he notes the distribution of GCSE grades in compulsory subjects (i.e. Maths and English ) have standard deviations of between 1.5 – 1.8 grades.  As such,  an improvement of one GCSE grade represents an effect size of 0.5 – 0.7.   So a teaching  intervention which led to an effect size of 0.6 would lead to each pupil improving by approximately one GCSE grade.

How large is an effect size? - a recent analysis

Slavin (2016) recently published an analysis of effect sizes which challenges these interpretations as to what is a large effect size. Slavin argues that what is a large effect size depends on two factors: sample size and student assignment to treatment or controlled groups (was it done randomly or through a matching process). This conclusion is based on a review of twelve meta-analyses and the 611 studies which met the rigorous standards necessary for inclusion in the John Hopkins University School of Education BestEvidence-Encyclopedia. The results of this analysis are as follows:


Small
Large
Matched
+0.32
+0.17
N (studies)
(215)
(209)
Random
+0.22
+0.11
N (studies)
(100)
(87)

One way of interpreting the above table is to say that if we take matched samples (424 studies in total), the average effect size for studies with less than 250 participants (0.32) is nearly twice the size of the effect size in studies of 250 or more participants (0.17).  Alternatively, small studies using random sample are likely to generate an effect size (0.22) which is twice that of larger studies (0.11).

So what are the implications of Slavin's analysis for the evidence-based practitioner?

First, Slavin argues - within Hattie's Visible Learning - there are a large number of studies which do not meet the requirements of the Best-Evidence Encyclopedia and should not be included in any calculation of the effectiveness or otherwise of educational interventions.

Second, once having removed insufficiently rigorous studies from the calculation of  Hattie's league table of effect sizes, this league table should be sub-divided into four separate tables - which depend upon the size of the sample (large or small)  and the nature of the sample (random or matched).

Third, the 0.4 hinge point which Hattie suggests teachers and headteachers use to identify those strategies with proven effectiveness, is in all likelihood incorrect and should not be used as screening mechanism for identifying strategies to introduce into a school.  Indeed, Slavin's work suggests the need for multiple hinge-points.

Fourth, the EEF table used for the interpretation of effect-size needs to be re-calibrated, to reflect the impact of sample sizes and random sampling/matching  on average effect size.  In other words, what is a large-effect size, would now appear to be smaller, as effect sizes are unlikely to be as large as anticipated, particularly in large multi-school studies involving more than 250 pupils.  This is a particularly urgent, as there is likely to be a number of schools who are currently using the EEF DIY evaluation tool-kit as a guide to practice, and the current guidelines for interpreting effect sizes may lead to some interventions being mis-classified as having relatively small effect sizes.

And finally ...

Where does this leave us, particularly with regard to John Hattie and Visible Learning.  Well for me, I think it would be difficult to justify the  use within a school of Hattie's league table of effective strategies to determine either changes in teaching strategies or the introduction of school-wide interventions.  What I think you can do is use Visible Learning to demonstrate the challenges and limitations associated with research-based teaching.  In the other words, the benefits in critically using Hattie's work within school are to do with building professional capital rather than as a tool for prioritising interventions.   If anything, the difficulties arising from Hattie's work suggest an even greater need for teachers to become effective evidence-based practitioners, who are able combine the different sources of evidence - research, school-data, stakeholder views, practitioner expertise - to make decisions which will hopefully lead to improved pupil outcomes and staff well-being

Note
I have not deconstructed Hattie's use of effect sizes - this has been more than ably done by Ollie Orange





4 comments:

  1. John Hattie has made a laudable contribution to the cause of evidence informed education policy and practice. Not least, he's popularised the use of Effect Size and the more general idea of testing the impact of an intervention. But there's some collateral damage and the concept of a 0.4 ES being a threshold is definitely one of them. Cohen himself (author of the famous Cohen's D effect size) used a similar scale, calling 0.2 'small' and 0.5 'medium'. In fact we know from an accumulating body of work on both sides of the Atlantic (and you quote some of it) that large scale, properly randomised classroom based evaluations rarely produce ESs above 0.2.

    It's taken a decade or so to get evidence on the agenda of thinking school leaders but a lot of them have bought the 0.4 threshold. It will probably take another decade to re-educate the them, to quote a more contemporary guru, Ben Goldacre, that "I think you'll find it's a bit more complicated than that"

    Paul Crisp
    www.curee.co.uk

    ReplyDelete
  2. We are of the better view and the opinion now and hopefully for the future these would even proved to be much better so. statement of purpose for mba marketing

    ReplyDelete
  3. Personal statement examples can serve as a useful and excellent guide for applicants who want to write a winning admissions essay. As much as these samples are highly useful, not all of them can actually help you create a lively and compelling essay. It is important for you to be able to determine which personal statement samples are actually not worth your time. personal statement for internal medicine

    ReplyDelete