Online professional networking
Share views, knowledge and best practice with other members by taking part in a discussion on our forum.
Join the discussion
The assessment community
Every year, around 70,000 individuals are involved annually in external examining, moderating and marking Key Stage tests, GCSEs and A Levels.More about the assessment community
Assessment reliability
What is reliability?
In accordance with the QCA Code of Practice, all awarding bodies are required to ensure that their assessments are fit for purpose, valid and reliable. Reliability in assessment has been defined by QCA as 'the extent to which assessment results are an accurate measurement of the candidate's demonstration of the abilities specified by the assessment criteria'. In other words, an assessment can be considered reliable if it provides a consistent set of measurements of whatever it is supposed to measure. Reliability is clearly very closely related to validity, but puts the emphasis more on the accuracy and consistency with which the assessment tool can be applied, rather than its appropriateness. Reliability requires not only that an individual mark or score or grade is accurate, but also that this mark or score or grade bears the appropriate relationship to any other in the same set of scores.
Awarding bodies are concerned with a number of different aspects of reliability, including the following:
Inter-marker reliability
It is very important that all markers of a particular assessment mark the work of all the candidates in a reliable and consistent manner. The awarding body must seek to ensure that a candidate scores the same mark, irrespective of who marks their work, in order to make the assessment fair and reliable. The marks awarded to all candidates by all the markers or examiners involved should form one internally consistent set of marks, as if they had all been marked by one highly reliable marker.
Ensuring inter-marker reliability
A pre-requisite for inter-marker reliability is that all examiners mark accurately i.e. that their marking is free of errors. All examiners are required to be scrupulous in their marking and to have their marking checked for clerical errors as well as for technical accuracy.
Awarding bodies seek to ensure inter-marker reliability in a number of ways, including the use of straightforward, unambiguous mark schemes which can be interpreted consistently by all examiners, the standardisation of examiners at the outset of the marking period and the monitoring of examiners throughout the time that they are marking. All these measures are intended to ensure that all examiners are marking in the same way as the principal examiner responsible for the assessment of that component.
The internal and external moderation of coursework is the means by which awarding bodies seek to ensure inter-marker reliability in internally marked assessments.
Reliability over time
The QCA Code of Practice states that the public interest in examinations 'extends to the proper maintenance of consistent standards...over time'. For awards to have credibility and currency, it is important that standards are consistent from one year to the next, so that it does not matter in which year candidates gain their grades - a Grade A in 2004 must represent the same standard as a Grade A in any other year.
Ensuring reliability over time
In order to ensure reliability over time, awarding bodies use a number of strategies. When setting assessments, principal examiners must follow the specification requirements and use the same specification grids each time, so that the same assessment objectives are targeted in the same weightings. The format of all question papers is closely specified, so that the framework does not vary much from one session to the next, even though the content will vary. Mark schemes too have a set format for each assessment which does not change significantly, so that examiners are required to mark in very much the same way from one session to the next.
If the marking can be assumed to be reliable over time, then awarding can be based on this assumption. The awarding committee for each specification seeks to ensure that standards are carried forward from one session to the next, using a range of qualitative and quantitative information.
Reliability when specifications change
Ensuring reliability over time is particularly challenging when there are significant changes to specifications, because some or all of the strategies outlined above cannot be used in this situation. At such a time, examiners and awarders have to use different information and evidence eg statistical data in order to ensure that the results are still reliable.
Reliability across specifications
An awarding body is required to ensure that standards across qualifications in a subject (eg GCSE Mathematics) are reliable, and furthermore that standards are consistent as far as possible across a qualification as a whole (eg GCSE). The chair of examiners plays a key role in establishing reliability within a subject area. The awarding body's accountable officer has an overview of all qualifications and is therefore responsible for ensuring reliability at the most senior level.
Ensuring reliability
There are a variety of ways in which the awarding bodies and the regulatory authorities seek to ensure assessment reliability. These include:
- The accreditation of specifications by QCA, DELLS and CCEA
- Adherence to the QCA Code of Practice
- Component-based assessment
- The use of specification grids by principal examiners drafting question papers and mark schemes
- Checks on draft question papers by revisers, question paper evaluation committees, scrutineers and awarding body officers
- Keeping question and option choice to a minimum
- Training and monitoring of examiners and markers
- Reviewing the performance of question papers by principal examiners
- Consideration of a wide range of qualitative and quantitative information by awarding committees and chairs of examiners
- Review of all recommended grade boundaries and grading outcomes by the awarding body's accountable officer
- QCA scrutiny and monitoring procedures.
Linking validity and reliability
Validity and reliability in assessment are very closely linked and in many cases inter-dependent. It is possible to think of cases where a valid assessment could not be conducted reliably, for example certain practical activities which produce purely ephemeral evidence. It is also possible to think of assessments which would be highly reliable but not particularly valid, for example certain multiple choice tests, or the use of a spelling test to assess linguistic ability. However, in most cases, validity and reliability are intertwined and examining personnel involved in devising assessments (developing specifications, drafting assessments and mark schemes) seek to ensure the maximum possible validity and reliability with an appropriate balance between the demands of the two. It is hoped that this gives an assessment which is truly fit for purpose.