Data quality assessment
NIPN is using existing data. These existing data probably already have been validated by a national institution after going through a data quality control process. For example, Demographic and Health Surveys are usually validated by the National Statistic Office and international organisations (https://data.unicef.org/resources/jme)
So, why should NIPN review the data quality of these datasets?
1) First, when conducting secondary data analysis, it is compulsory to have a critical eye on the quality of the data before using it. It is important to know if the data are fit-for-purpose for the intended analysis:
- What was the study design?
- How were the data collected?
- What data quality process has been effectively followed?
- What are the conclusions of the data quality report attached to the dataset?
2) Second, the data probably were collected for a specific data analysis objective that may be different from the NIPN data analysis objective. Therefore, to achieve the NIPN objective, a different data quality level may be needed.
“Data quality” is neither “good” nor “bad”: it should be “adequate” for the intended analysis*****
To produce data that are of acceptable quality, it is necessary to ensure that key steps in the process of data collection are respecting a well-designed protocol. Those key steps are:
- elaboration of the data collection instruments;
- training of surveyors;
- data collection;
- data entry;
- data cleaning;
- data quality tests.
In the context of NIPN, the data have already been collected, entered and cleaned.
When referring to “the assessment of data quality” in these guidance notes, we specifically refer to the data quality tests that can be performed on the cleaned datasets.
When conducting secondary data analysis, it is important to know the protocol and the process that have been effectively implemented to ensure the quality of the data collected. In particular, the data quality tests that have been performed are normally compiled in a separate report. In these guidance notes, we describe the main data quality tests.
There is no threshold for “data quality”. Some methods propose a “global score” for data quality which provides an overall indication but this score should not be used as a standard threshold. It is ultimately the responsibility of the data team to decide if the data quality is good enough for the planned analysis. The tests described in these guidance notes will be key to inform this decision.
Given the role of the information platforms to influence policy decisions, it is highly recommended to take a conservative approach towards data quality in order to avoid criticism that could damage the reputation of the platform.
The main sources of information for NIPN are:
- “Population-based survey data” are collected in cross-sectional surveys designed to be representative of the studied population (e.g. national or district population).
- “Routine Data” refer to systematically and regularly collected information, typically from health centres (ex: disease, birth and death registers).
Population based survey data and routine data have a very different purpose, protocol and structure. Therefore, assessing the data quality can be quite different depending on the source of data.
This guidance note provides details for both population-based surveys (pages 3 to 5) and routinely collected data (pages 6 onwards).*****
Population based surveys are prone to 2 main types of errors:*****
This curriculum includes all necessary materials to facilitate the training: presentations, examples, a facilitator’s guide, practical exercises and their solutions, and parts of real datasets. The curriculum is fully aligned with the recent UNICEF/WHO recommendations(1).
The 3 days-training is organised as follows:
- DAY 1: Survey methods, anthropometric measurements
- DAY 2: Performing Data Quality tests, how to read a data quality assessment report
- DAY 3: Make a decision on data quality level required for the intended analysis
The curriculum is meant to be used by Technical Assistants, that support the NIPN data teams, to facilitate the training of the NIPN team and its partners when relevant.
Note that the curriculum is especially focused on anthropometric measurements, though certain tests can be applied to other continuous variables as well.
Trainers are encouraged to:
- Download the 3 days curriculum materials provided on the next page
- Download the Facilitator’s Guide Word file
- Watch the webinar introducing the training curriculum
- Discuss with NIPN data teams to decide whether the training is relevant
- Adapt the training, as appropriate, and facilitate
To support the data quality tests discussed in the training curriculum, a statistical test package has been developed in R software: nipnTK.r
This package includes built-in functions to generate z-score and perform other data quality tests. It is recommended for users that already have a basic knowledge of R software.
A ‘R’ user guide is also available in order to facilitate performing the tests mentioned in the training curriculum.*****
- Recommendations for data collection, analysis and reporting on anthropometric indicators in children under 5 years old. Geneva: World Health Organization and the United Nations Children’s Fund (UNICEF), 2019. Licence: CC BY-NC-SA 3.0 IGO.
- Session 1: Introduction to day 1 (download PPT)
- Session 3: Nutrition overview (download PPT)
- Session 4: Nutrition surveys (download PPT)
- Session 5: Survey procedures (download PPT)
- Session 6: Wrap-up day 1 (download PPT)
- Download additional material for day 1 package (ZIP file)
- Download exercises for day 1 package (ZIP file)
- Session 1: Introduction to day 2 (download PPT EN) / (Télécharger PPT FR)
- Session 2: Data quality checks (download PPT EN) / (Télécharger PPT FR)
- Session 3: Survey data quality (download PPT EN) / (Télécharger PPT FR)
- Session 4: Anthropometry data quality - basics (download PPT EN) / (Télécharger PPT FR)
- Session 5: Anthropometry data quality - advanced (download PPT EN) / (Télécharger PPT FR)
- Session 6: Wrap-up day 2 (download PPT EN) / (Télécharger PPT FR)
Additional material/matériel supplémentaire
- Download additional material for day 2 in English (ZIP file)
- Télécharger le matériel supplémentaire pour le jour 2 en français (fichier ZIP)
- Download exercises for day 2 in English (ZIP file)
- Télécharger les exercices pour le jour 2 en français (fichier ZIP)
- Session 1: Introduction to day 3 (download PPT EN) / (Télécharger PPT FR)
- Session 2: Data checks debates (download PPT EN) / (Télécharger PPT FR)
- Session 3: Interpreting a report (download PPT EN) / (Télécharger PPT FR)
- Session 4: Making a decision tree (download PPT EN) / (Télécharger PPT FR)
- Session 5: Take-home exercise (download PPT EN) / (Télécharger PPT FR)
- Download exercises for day 3 in English (ZIP file)
- Télécharger les exercices pour le jour 3 en français (fichier ZIP)
2. Why assess routine data quality?
Although routine data are administrative data depending on the purpose of use, it is required to assess their quality and identify weaknesses or areas to improve.
Deciding whether the data are good enough to use, will depend on the type of analysis (for instance specific analysis, continuous quality improvement, etc.).
Most countries already dispose of routine data quality assessment processes, which are usually focused on core indicators that do not necessarily include nutrition data.
It might be useful, even crucial, to investigate whether already validated data at country level are good enough for the intended analysis to ensure that data quality is in coherence with the needs of NIPN analysis.
Most of the existing tools, such as the WHO Data Quality Review toolkit (DQR) focus on core indicators of programmes or projects (e.g. immunisation, antenatal care, tuberculosis, HIV and malaria indicators). Nutrition indicators are often not included in regular quality assessments carried out in countries.
The National Information Platforms can have an added value on advocating to integrate the nutrition indicators in the data quality assessment process within HMIS.
Finally, guidance on choosing a quality assessment tool and how to use it, can facilitate cross-country learning and comparison, thus providing a unique standard of quality.*****
7. Additional methods for assessment of routine data quality
a) Other methods for routine data quality assessment
In addition to the tools presented, other methods exist to assess routine data quality.
Some of them are presented in the next slides. They are complementary to the tests discussed above, and each one of them applies to one of the data quality dimensions. In that sense these tests are not fundamentally different from the tool presented but are complementary.
The table below summarises the additional methods, including key questions and sub-questions that can be asked to help the data analyst in better understanding the quality of the data at hand.
In practice, a combination of these data quality assessment methods needs to be used to assess data quality, as each one of them separately do not suffice.*****
Overview of additional methods*****
5. Additional considerations important to NIPNGood data quality is not always sufficient to obtain strong evidence. It is also important to ensure that overlapping data are as comparable as possible. This can be done by:
- Harmonising geographical area
- Harmonising time frames
- Harmonising the way of asking questions
- Harmonising definitions of nutrition indicators (be mindful that these can slightly differ from one country to another, for instance in the definition of the age groups used, etc.
3. What influences routine data quality?
a) General remarks
It is important to assess the coverage of the HMIS (in our case for nutrition indicators. If coverage is low, nutrition data from HMIS will probably be biased and therefore not useful.
Routine data quality issues are similar for most indicators. As shown in the figure below, the main issues are related to the accuracy and completeness of the numerator (1 and 2) as well as to the estimation method and the accuracy of the denominator (3).*****
WHO Routine Data Quality dimensions framework
Source: Countdown to 2030 for Women’s, Children’s and Adolescents’ Health - Presentation in analysis workshop of health facility data for key health system performance indicators, May 2019*****
Data quality is influenced during and after data collection. Three basic factors affect data quality (Measure Evaluation) of program-level results when compared over time:
- Instrumentation – Instrumentation refers to the way in which data are collected. The methods used to collect and compile results during one reporting period may not be the same methods used to collect and compile results during the next reporting period. As a result of this “measurement bias” the two sets of results may not be directly comparable. (for example weighing scales are not of the same type or not calibrated correctly)
- Programmatic – The results from one reporting period could appear inconsistent with the equivalent results from another reporting period because of real changes in program implementation and increased or decreased program activity. (for instance expansion of breastfeeding promotion sessions at health center, to include home visits in next reporting period or the facility can be closed because the nurse went for training or to get the drugs at the district pharmacy, etc.)
- Measurement – Changes in indicator definitions could result in program-level results being measured in different ways across time periods. In this case, the results from one reporting period would not necessarily be directly comparable with the results from another reporting period. This might also be linked to the quality (of training of) the enumerators.
b) Quick introduction to WHO 2017 Data Quality Review (DQR)
The DQR is an Excel-based analytical tool which calculates standard data quality metrics for selected indicators. It examines the quality of data generated by a health facility-based information system for up to six tracer indicators from across program areas. In addition to program areas used for illustration in the tool, the DQR can be also used to assess other program areas indicators, such as Nutrition, by entering appropriate information on the data (indicator) to be assessed.
Through analysis of the six tracer indicators, the tool quantifies problems of data completeness, accuracy and external consistency and thus provides valuable information on the extent to which data are “fit-for-purpose” to support planning and annual monitoring (DQR website).
All examples provided below are from the WHO Data Quality Review Toolkit.
4. How to measure good quality data and what to recommend for use by NIPN?
a) General remarks
Most routine data quality assessment tools differentiate amongst types of data quality, which are either called domains, dimensions or attributes:
- The WHO DQR tool uses domains and metrics to assess routine data quality
- The Measure Evaluation tool uses dimensions for what is called metric in WHO DQR tool
- Chen et al., 2014 (1) named them attributes.
All these denominations designate more or less the same thing. Hong Chen et al. concluded in their review that completeness, accuracy, and timeliness were the three most-assessed dimensions of data quality, however they have identified more than 30 dimensions (see below list) .*****
Data quality dimensions
(1) Chen, Hong & Hailey, David & Wang, Ning & Yu, Ping. (2014). A Review of Data Quality Assessment Methods for Public Health Information Systems. International journal of environmental research and public health. 11. 5170-207. 10.3390/ijerph110505170.*****
b) Choice of tool
A number of routine data quality assessment tools are being used in countries in routine assessment of data quality or during field supervision activities. The strengths and weaknesses of the main tools are summarised in the link below.
A questionnaire addressed to the NIPN countries revealed that almost all countries use the WHO DQR to assess routine data quality. WHO DQR is used as part of the DHIS-2 or separately as Excel sheet App to assess routine data quality using tracer indicator.
To sustain the routine data quality assessment by the NIPN country teams, this note recommends the use of WHO DQR (DHIS 2 integrated App or Excel Sheet format) for assessing nutrition indicators from routine data systems.*****
Data quality assessment tools
b) Routine data quality framework and existing routine data systems
Project/program activities are carried out and monitored at the delivery sites to quantify progress or efforts using a certain indicator.
According to the HMIS Facilitators Guide for Training of Trainers (Measure Evaluation) “an indicator is a variable that describes a given situation and thus permits measurement of changes over time. It transforms crude information into a form that is more suited for decision-making”.
The figure below provides schematic framework on how to evaluate routine data quality using 6 dimensions.*****
Schematic Framework of Data Quality
Source: Measure Evaluation*****
Among existing routine data systems (click on the link at the bottom of the page for more details) certain seem to be better suited to the purpose of the national information platforms on nutrition: Health Management Information System (HMIS) – best suited, Integrated Diseases Surveillance and Response (IDSR), Sentinel Surveillance, and other Sector based data systems the less organised amongst all.
Sentinel surveillance data are also part of routine data. A sentinel surveillance system can be used to collect nutrition data. Unlike population-based surveillance, sentinel surveillance does offer greater design flexibility with participation requirements of various network partners.
A line ministry (such as health, agriculture, education) that implements nutrition-sensitive interventions will have its own data collection process and system. In many cases the HMIS is better organised and systematised than the information system of other sectors.
However, these routine data systems do not sufficiently integrate nutrition indicators yet. NIPN and other nutrition partners should advocate for integration of more nutrition indicators, both in short term information systems the sentinel surveillance systems during emergencies, but also in permanent, continuous information systems such as HMIS.*****
Existing routine data systems
The National Information Platform for Nutrition (NIPN) values existing data from diverse sources to address nutrition policy relevant questions and provide evidence to decision-makers.
Why use routine data in this process? Surveys are expensive while routine data are not, surveys are done every 3-5 years while routine data are collected more regularly and therefore provide recent information. Moreover, at present, many countries have well-functioning routine data collection systems and increasingly implement an electronic platform.
Furthermore, routine data permit standardised analyses across geographical levels, such as district level (1). In each NIPN country data quality of routine data is being assessed using one or more tools (DQR, RDQA, etc.).
Using data from all sectors that contribute to nutrition would be ideal. However, data from routine health information system (HIS) are among the most organised and accessible data, which is not the case for other sectors. It appears crucial to give special importance to Health Management Information System (HMIS) data and their quality in NIPN.
The main goal of this guide is to ensure a standardised quality approach is being used to check routine data quality, matching with the NIPN objectives and values. Furthermore, using a published tool to assess routine data in the NIPN operational cycle could de facto be an added value to the sustainability of the NIPN approach.
(1) Cesar G Victora, Robert Black, J Ties Boerma, Jennifer Bryce. Measuring impact in the Millennium Development Goal era and beyond: a new approach to large-scale effectiveness evaluations. Lancet, 2011;377:85-95.*****
b) Data system assessment
This guidance note is addressing “data verification” to assess the quality of selected nutrition data.
It does not evaluate the overall “system assessment” of the management and reporting system. For those who are interested in knowing more about system assessment, here are some links and references:
- Health Information System (HIS) Assessment Tools Database
- A Guide to Monitoring and Evaluation of Capacity-Building Interventions in the Health Sector in Developing Countries