The negative consequences for Scottish Education of Inspectors’ continuing misuse of examination statistics

I am moved to make this post, and I promise it will be my last on the subject (!!!) as it still surprises me that our Scottish school Inspection system does such a poor job, relative to its potential to improve Scottish education.  This criticism may seem surprising, given the comparative beneficence of Scottish HMIe (Her Majesty’s Inspectors of Education) when compared with their English counterparts in OFSTED, particularly today when OFSTED ramps up the pressure on schools in England to ‘perform’.  Should we not just be grateful up here that we don’t have the OFSTED model, with its inbuilt design that some schools must be failing, to worry about?   The answer to that question has to be ‘no’.  HMIe in Scotland may now be less brutalist in their approach, but their gloved hand can still strike, as the mailed fist of OFSTED does, in the wrong place.  Like their counterparts south of the border, by continuing to conceptualise educational failure as the sole responsibility of schools, HMIe participate in public misinformation and thus contribute to that very failure.  Here’s the link to a short letter on this subject, which recently appeared in the Times Educational Supplement Scotland:  click here

I copy below extracts from the much longer closely argued academic article which I wrote in 2008 and circulated privately.  I would be happy to send the full article to anyone who requests it.   I did not publish at the time as I believed that were some Inspectors who would not ‘forget’ such strong public criticism and that this might lead to later problems for me and/or for the school.   It argues that Inspectors should be more cautious in their overall judgements about schools, that the ‘market’ choice approach to schooling which demanded such judgement is flawed and that by focusing most of their work on school comparison, Inspectors failed, and continue to fail, in their wider duty to address the major weaknesses of Scottish education.   These are not just to do with the varying quality of our schools, but the fact that a significant minority of our population, often disadvantaged at birth and in their social context, do not realise their potential in school, and in many cases, in later life.  This is not a school problem alone, but a social issue, solutions to which require our wider society to work together.  There is no doubt that individual schools and individual teachers can and do make a different to individual young people each and every day, but the Inspectorate model for the system as a whole – ‘if only all schools were excellent, every child would achieve his or her potential’ – hides the contribution that all society needs to make to the formation of our future citizens and does us all a disservice.  Moreover their focus on schools alone hides the complex interconnected web of responsibility and accountability, stretching across national and local government agencies, public service professions, parents, churches, civil servants, colleges, employers and the young people themselves, for the quality of education in a particular area or school.  This is particularly the case in Scotland, where local authorities continue to play a much more significant role in policy and support than their counterparts in England.

There has been change since 2008.  Inspections are now even more responsive to the early judgements about the quality of schooling made through the initial review of school statistics and early observations.  The new shorter Inspection reports give much less information about, and prominence to, comparative examination performance.  The context of Scottish secondary schooling, with the challenges of Curriculum for Excellence and the imminent arrival of a new set of examinations at age 16, has led to more variety in curriculum approaches.  However, although briefer and less statistically detailed, Inspectors still make use, in their report on a school, of the same comparator schools data as if it was definitive evidence of the relative performance of different secondary schools. The statements made about these comparator schools in 2012 are the same as those made in 2007/8: The comparator schools are defined as ‘the 20 schools statistically closest to the school being inspected in terms of the key characteristics of the school population’.  This is a very strong statement about the validity of the comparator school data.  The implication is that other characteristics may not be measured but that does not matter as they are not ‘the key characteristics’.  Then, as now, that is neither a valid, nor a wholly reliable statement.  Its continued use undermines the intellectual credibility of the Inspection process.

Extracts from the previous article (2008):  “The Use of Examination Statistics in Scottish Secondary School Accountability: the case of ‘Comparator Schools”.

Extract 1:

This article examines the use of examination attainment information in the Inspection of secondary schools in Scotland by Her Majesty’s Inspectors of Education (HMIE).    It is written for an audience within the education community – Inspectors, Headteachers and others with a professional knowledge and interest in the Inspection process and its role in improving the quality of Scottish education.  There are five sections:

  1. The first section introduces briefly the complexity and potential confusions associated with school level attainment information.   The limitations of this data as a source of information about school effectiveness are widely recognised.  There is no direct statistical relationship between pupil attainment and any one clearly identifiable contextual factor.  Socio-economic factors are at best ‘proxy’ indicators, with an approximate relationship to attainment, while reliable figures for prior attainment (the best indicator of future attainment, according to the research community) are not available in Scotland.
  2. The second section outlines the role of HMIE in interpreting that data and providing more reliable judgements of quality in schooling.  They make use of attainment data alongside other sources of information in order to judge the overall quality of education in a particular school and believe that their ‘triangulation’ methods provide a reliable basis for sound judgement.  However their use of statistics in their reports on individual schools appears naïve at best and is often misleading.
  3. The third section outlines HMIE use of ‘comparator’ or ‘similar’ schools to provide a benchmark against which the examination performance of pupils in a particular school can be judged.  Judgements made against this benchmark feature heavily in the published report.
  4. The fourth section provides a comprehensive critique of how HMIE use this benchmarking data to generate apparently ‘definitive’ judgements about school attainment.  These judgements are far from definitive and do not stand up to a number of tests of validity or reliability.
  5. A key role of HMIE is to provide schools, education authorities and Ministers with reliable advice on how the Scottish system can improve.  The overly simplistic accounts which HMIE give at present, with their naïve explanations of success or failure in particular school communities and contexts, will never provide an appropriate basis for addressing the continuing failure of the Scottish education system to engage positively with 100% of our young people.  This final section therefore makes some general recommendations, for further discussion and consideration within the educational community, on how the quality of information used to support school improvement can itself be improved

Extract 2:

4.         Analysis of HMIE use of comparator school data:

Schools in the comparator group, as outlined above, play a very important role in the Inspection process, featuring strongly in the published report.  The text in which reference is made to the comparator group is misleading:

  1. the comparator schools are described in the text of a report as ‘similar schools’.  However these schools are not ‘similar schools’, but ‘schools which share some similar socio – economic characteristics’.  A large number of characteristics, including important socio-economic characteristics, are not considered (see below).
  2. in section 3 of the report, under the heading of ‘attainment’, statements are made about the overall attainment of pupils at different stages relative to the comparator group.  These statements are made according to where the school appears in a rank order, best to worst, in the appropriate measure.  If in the top 4, the school does ‘much better than similar schools’; if 5th to 8th, ‘better than similar schools’; if 9th to 12th, ‘in line with similar schools’; if 13th to 16th, ‘not as good as..’ and if 17th to 20th, ‘notably weaker than..’.  The judgements (‘better’, ‘weaker’ etc) are generated by a norm-referenced formula.  The difference between a school which is performing ‘better’ or one whose performance is ‘notably weaker’ in a particular area may be as little as 5 or 6 percentage points – too small a variation to justify such a significantly different qualitative judgement.  A variation of 6 percentage points either way might be well within the confidence limits expressed for these statistics, if indeed there were such confidence limits – another critical point which is followed up below.  Moreover these crude judgements made about schools ignore the wealth of data we now have about subjects within schools.  One subject might be ‘notably weaker, while others are ‘better than’, yet overall the school is ‘not as good as’.
  3. In Appendix 3 of a report, attainment in the school in 14 areas/stages (such as the number of pupils who have obtained 1 Level 6[i] qualification by the end of S5) is compared to the average of the 20 comparator schools and to the national average.  The comparator schools are defined as ‘the 20 schools statistically closest to the school being inspected in terms of the key characteristics of the school population’.  This is a very strong statement about the validity of the comparator school data.  The implication is that other characteristics may not be measured but that does not matter as they are not ‘the key characteristics’.

These statements made in Inspection reports assume a degree of authority which is not justified by the statistical methodology used.  The errors associated with this undermine the professional credibility of Inspection in Scottish secondary schools as statements are made in reports about statistics which cannot be justified by the statistical data. This may lead to misjudgements about the overall performance of a school, given the importance which attaches to attainment information within the final report.

What degree of caution, then, should Inspectors excercise in the professional judgements they make on the basis of ‘comparator school’ statistics?  Is it indeed possible to make judgements that have any value at all?  The second of these questions will be answered in the final section of this paper.  The first question is answered in the seven critical points which follow, each of which on their own would be sufficient to suggest a cautious provisional judgement rather than the decisive certainty by which performance is allocated to one of six levels.  Together they provide a damning critique of the overly simplistic use of such statistics by such a powerful national agency:

  1. Other socio-economic factors:   schools, as we have seen, are compared on the basis of some socio-economic factors which correlate strongly, but not absolutely, with examination performance.  The information available in relation to these factors is described by Inspectors as ‘robust’ (i.e valid information from reliable sources).  This is not the same as saying that their use to judge performance is ‘robust’ i.e. accurate and precise.  Even accepting that only socio-economic factors are worth comparing (despite substantial evidence that prior academic attainment correlates as or more strongly with later attainment[ii]), the socio-economic factors of relevance are heavily imbalanced towards the poorer end of the socio-economic spectrum, but judgements are made about performance across the entire school:  only one factor (percentage of pupils’ mothers with a degree level qualification  (2001 Census))  addresses an aspect of the wider social profile of the school catchment area.   Take for example two schools, similar in profile at the poorer end of the socio-economic spectrum but significantly different at the richer end:
  • School A:   25% of pupils eligible for Free School Meals and 15% living in areas within the 15% most deprived (SIMD);  35% of pupils coming from homes where the mother has a degree level qualification.
  • School B:   25% of pupils eligible for Free School Meals and 15% living in areas within the 15% most deprived (SIMD);  5% of pupils coming from homes where the mother has a degree level qualification.

Whereas a degree of comparison of the S4 performance of pupils in these schools in relation to pupils attaining at Level 3 might be appropriate, comparison of attainment at Level 5 or Level 6 might not, since the second school has a much lower proportion of mothers educated to degree level.

2.   Sample size:     There is also a question of statistical scale.  Statistical differences in factor effects which are not significant at the level of national populations may, at the level of an individual school, make a very significant difference.  In an individual school with an S4 roll of 200, 10 pupils one way or the other makes a difference of 5% in the figures and might represent the difference between being in the top 8 schools (and therefore ‘better than similar schools’) or the bottom 8 of the comparator group (and therefore ‘performing less well than similar schools’).  These individual effects increase as the sample size decreases, a fact recognised by Inspectors in excluding very small schools from this statistical analysis, but there may also be marked individual effects in larger schools.

3.   Data collection differences:  There are also bureaucratic differences in how data are acquired, differences which suggest that some of the data might be less ‘robust’ than Inspectors would wish.  Differences in how local authorities complete the annual school Census can have significant effects.  Factors used in judgements made during inspections in the session 2006-7 included the % of children with Additional Support (Special Educational) Needs.  However there is no nationally moderated assessment system for determining such needs, so schools in different parts of the country may have the same numbers of pupils with similar needs but quite different statistics for this recorded in the national census. Two schools with the same S4 roll (say 200 in S4) and the same socio-economic characteristics as defined by PCA methodology may therefore have significantly different numbers of pupils entered on the national census as having additional needs.  This factor was removed from the PCA methodology in session 2007-8 because of this potentially erratic effect on the validity and reliability of the comparison; however Inspectors’ judgements made during session 2006-7, based on this data, still stand!  There is no apology or retraction (yet) for judgements made using this misleading comparison.  In another example, in one local authority, pupils with emotional and behavioural needs which require them to be educated outside the mainstream school (at a centre for that purpose) will not appear on the school roll.  In a neighbouring authority, schools are required to include these pupils on their roll, even where they may never have attended the school and may be presented for national examinations outwith the school.  In other schools, pupils at risk of not achieving in S4 may ‘repeat S3’ before they leave.  Such differences in how the S4 roll is determined in particular authorities are not factored into the data analysis.  As illustrated in point 2 above, these kinds of differences in census recording might be enough to make the difference between a positive and a negative judgement.  Bureaucratic differences between authorities may therefore account for more of the apparent difference between schools than any aspect of school performance.

4.   Annual changes in the comparator group:   Small changes in statistical data from one year to another might lead to significant changes in the comparator school group.  I know of a school which has only three schools in the 2007-8 comparator group that were also in its 2006-7 comparator group.  17 if its comparators were different.  This can lead to bizarre effects.  In the majority of the 14 areas of attainment reported on in their recent Inspection report, this school appeared to be doing less well than comparator schools, both in that year and also in the three year trend (2004-6).  However, with only slight improvements in actual attainments, but with 17 different comparator schools, the school now comfortably outperforms its new comparator schools in all but two areas of attainment as well as in the new three year trends (2005-7), since trends are calculated relative to the current year comparator schools[iii].  If the Inspection report had taken place within the 2007-8 session, this would have led to quite different statements on overall attainment, even although there were virtually no differences in actual attainment.

5.   Placing requests:       No allowance is made for the impact of placing requests out of and into catchment area:  the first two factors, based on national Census data , relate to school catchment area, not to the actual pupil population.  In some areas, there is a large number of placing requests across catchment area boundaries.  These may be disproportionately related to other factors in the analysis in individual school cases.  Statistics on placing requests in urban areas, where pockets of high deprivation are close to pockets of wealth (Bearsden and Drumchapel, Muirhouse and Barnton) suggest that one hypothesis worth investigating, for example, might be whether pupils whose mothers have a degree level qualification may be more likely to be placed out of a school catchment area where the catchment area school has a very small number of such mothers, or a very high proportion of educationally disadvantaged pupils.  Much work would need to be done to account for this factor, which might affect both sending and receiving schools, in the overall analysis.  Its existence puts further doubt on the definitive relationships which inform HMIE judgements.

6.   Trends not snapshots:        Reliability might be improved by use of trend statistics rather than the one year snapshot which HMIe currently use.  The comparator school data already sets this up, since data for two of the factors involved is derived from as long ago as the 2001 census.  If comparator schools are to be established for each school, there should be a degree of stability about the comparison.  However such stability over time in the comparator group would necessarily render any judgements about performance in particular years even more provisional, as factors which fluctuate year on year (such as free school meal registration) would also have to be controlled in the statistical calculation.  Inspectors might find this uncomfortable, however, as it might make it difficult for them to make judgements with any degree of certainty at all.  However the argument of this analysis is that might be no bad thing, as too definitive an approach is almost inevitably misleading.  It is the need and desire  to make comparative judgements to inform the school ‘marketplace’ which drives the entire Inspection system at present.   This ‘need’ needs to be removed!

7.       Validity:      It is not clear that the measures of attainment chosen by Inspectors for their reports are as valid as other possible measures of overall school attainment.  Certainly, these are measures which have been around for a long time and which are well known within the Scottish system: for example the percentage of pupils in a given year obtaining 5 or more awards at Level 3, Level 4 and Level 5.  However focussing on these ‘boxes’ of attainment has encouraged many schools to focus on those 10 or so pupils who might be shifted from one box to another, without necessarily reflecting a general change in the overall attainment of pupils in the school.  This is a ‘perverse incentive’ of this particular measurement.  A more absolute measure of attainment is the tariff score (see page 2 above) but this is not referred to in reports at all.  However this might produce its own ‘perverse incentives’ (e.g. presenting every pupil for as many units / courses as possible to bump up their tariff score).  The ‘perverse incentives’ for Department Heads differ from those of school managers.  For Heads of Department looking to improve relative ratings the incentive is not to present any pupil who might fail.  In among a series of useful analytical critiques of wider policy aspects of the present accountability framework, Cowie et al provide an extended account of these ‘perverse incentives’ in the current system[iv].

 In addition, the national system for accumulating credit does not recognise many worthwhile attainments – ASDAN certification for example is equivalent to SCQF levels but not yet recognised in the national counting system, while problems remain with attributing the correct credit for courses run in partnership with FE Colleges.  Such educationally worthwhile programmes, encouraged in national curricular advice, are not yet reliably credited in the statistics used by Inspectors. There is an even bigger argument that examination results are easy to measure but do not measure what is most worth measuring.  Curriculum for Excellence sets new challenges for all schools in Scotland.   Examination results may be an important proxy indicator as to whether or not a school produces ‘successful learners’, but the relationship between the cognitive skills required to perform well in public examinations and the kinds of learning appropriate in a job context, for example, is only rough and ready.  This question goes right to the heart of the Inspection process.  Are we measuring what we should be measuring in the performance of schools or are we measuring what we can easily measure?

Where judgements of such importance are being made, and where so many factors might render invalid and unreliable too accurate and precise a use of the statistics involved, it would be appropriate to set confidence limits on the statistical basis of the judgement.  ‘Confidence limits’ are a statistical representation of the uncertainties underlying a particular statistical expression or relationship.  HMIe use confidence limits in relation to the responses to their stakeholder questionnaires, establishing an appropriate degree of uncertainty in any judgements made on the basis of these statistics.  OFSTED (the English equivalent of HMIE) use confidence limits in their context based assessments of schools.  The certainty of the judgements made in Inspection reports in relation to examination attainment is not warranted by the weak statistical foundations outlined in points 1 to 7 above.  The fact that these judgements are then used to underpin other judgements across the school erodes confidence in the overall judgements made.

Extract 3:

The persistence of problems in our education system (for example the ‘hard to reach’ 20% at the bottom end of the attainment spectrum) suggests a need for new questions and new answers.  The answer most often offered by the Inspectors and their reports is that a school, its headteacher and / or its staff are just not doing well enough.  This overly simple judgement masks the complexity of the challenge for schools which are trying to get the ‘most hard to reach 20%’ to a point where they can move into a positive post-school placement, in work, training of continuing education.  Too simple an explanation (organisational or professional failure or incompetence) will never lead to the kinds of insights required if the system as a whole is too improve.  Complex, multiple, competing values are at work in our communities and the overly simplistic Inspection model does not get at that complexity.  This oversimplicity lies at the heart of the problems Inspectors face as they fail to give sufficiently accurate advice and information to politicians and the civic community about a highly complex system, or sometimes about individually complex schools.   The case studies of good practice analysed in the recently published The Journey to Excellence[i] are individually valuable, but the narrow focus on professional competence as a solution to all schooling difficulties remains.  In moving from description to prescription, there is a clamant need for a deeper level of explanation than ‘organisational failure’ or ‘professional incompetence’.

In schools in difficulty, the quality of the work of staff may require attention and can always be improved, but my professional experience suggests it can only ever be part of the answer.   I have now been inspected as a Headteacher in three very different schools – in the first two, my leadership could not, apparently, have been bettered.  However, my most recent performance is judged to be the worst it has ever been (a grudging ‘good’), although I have never worked harder.   Moreover the overall report on the school I am now privileged to lead was the poorest report I have ever been involved in, and yet the teaching and support staff in the school work every bit as hard, if not harder, than staff in any of my previous schools, while the circumstances in which they are working are much less favourable, requiring considerable emotional resilience as well as technical competence.  The context of the school must account for some of the very real difficulties we as a team face in providing a better educational experience for its pupils.  Comparator school data do not get near the kinds of sophisticated analysis, involving issues such as the social and educational capital of the school community (see above), that would be required to provide appropriate advice.  There is then a significant knock-on effect in political thinking about education.  If the solution to our educational problems is seen to be professional, then the solution for politicians is seen to be better professional accountability (hence the massive rise in Inspection regimes at local authority as well as national level).  This takes the focus away from wider social concerns about how our young people learn to be part of and contribute to their community.   Education becomes the job of the professionals, not the work of the whole community.

[i]    for Scottish qualification levels see

[ii]    See earlier references on the importance of prior attainment as a factor

[iii]    See Appendix 1 above

[iv]    Cowie M, Taylor D and Croxford L (2007), Tough Intelligent Accountability in Scottish secondary schools and the role of Standard Tables and Charts (STACS):  a critical appraisal, Scottish Educational Review, 39/1, pp29-50.   See also the briefing paper at .


6 thoughts on “The negative consequences for Scottish Education of Inspectors’ continuing misuse of examination statistics

  1. Hi Danny

    Eloquent and accurate as ever. Just looked at all the SCOTXED data today and discovered that Eastwood HS has been added to my list of comparative schools with Bo-ness taken away. Looking at PCA data, I’d always reckoned that Bo’ness was not too different from Musselburgh. Roll on the senior phase benchmarking changes, please.

    Ronnie Summers

    • Thanks for comment Ronnie… I had a go through private channels, as you know, in 2008 but here we are four years later and they (albeit a more cuddly and less arrogant ‘they’) are still making the same obvious mistakes with stats. That’s why a proper appeals process is long overdue. Maybe senior phase benchmarking will get it right??

      • Hi Danny. By one of those uncanny coincidences, I was in Victoria Quay today talking to two of the people leading on this! I’m very hopeful that the new approach will be more transparent and actually help schools to work together on raising attainment.

  2. Pingback: Let’s make Scottish education fairer for all | Danny Murphy's Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.