Data quality assessment

  • Why is data quality assessment needed in NIPN?

    NIPN is using existing data. Therefore the data has probably already been validated by a national institution after going through a data quality control process. For example, Demographic and Health Surveys are usually validated by the National Statistic Office and international organisations (https://data.unicef.org/resources/jme)

    So, why should NIPN review the data quality of these datasets?

    1) First, when conducting secondary data analysis, it is compulsory to have a critical eye on the quality of the data before using it. It is important to know if the data is suited for the intended analysis:

    • What is the study design?
    • How has the data been collected?
    • What data quality process has been effectively followed?
    • What are the conclusions of the data quality report attached to the dataset?

    2) Second, the data has probably been collected for a specific data analysis objective that may be different from the NIPN data analysis objective. Therefore, to achieve the NIPN objective, a different data quality level may be needed.

    “Data quality” is neither “good” nor “bad”: it should be “good enough” for the intended analysis

    *****
    Examples
  • Scope of this data quality guidance note

    To produce data that are of acceptable quality, it is necessary to ensure that key steps in the process of data collection are respecting a well designed protocol. Those key steps are:

    • the elaboration of the data collection instruments;
    • the training of surveyors;
    • the sampling;
    • the data collection;
    • the data entry;
    • the data cleaning;
    • the data quality tests.

    In the context of NIPN, the data have already been collected, entered and cleaned.

    When referring to “assess the data quality” in these guidance notes, we specifically refer to the data quality tests that can be performed on the cleaned datasets.

    When conducting secondary data analysis, it is important to know the protocol and the process that have been effectively implemented to ensure the quality of the data collected. In particular, the data quality tests that have been performed are normally compiled in a separate report. In these guidance notes, we describe the main data quality tests.

    There is no threshold for “data quality”. Some methods propose a “global score” for data quality which provides an overall indication but this score should not be used as a standard threshold. It is ultimately the responsibility of the NIPN data team to decide if the data quality is good enough for the planned analysis. The tests described in these guidance notes will be key to inform this decision.

    Given the role of NIPN to influence policy decisions, it is highly recommended to take a conservative approach towards data quality in order to avoid criticism that could damage the reputation of the NIPN.

    The main sources of information for NIPN are:

    • Population-based Survey” refer to cross-sectional surveys that are designed to be representative of the studied population (ex: national population; sub-national population).
    • Routine Data” refers to systematic information collected on a regular basis, typically from health centres (ex: disease registries ; births and deaths).

    Population based survey data and routine data have a very different purpose, protocol and structure. Therefore, assessing the data quality will be very different depending on the source of data.

    This guidance note provides details for both population-based surveys (pages 3 to 5) and routinely collected data (pages 6 onwards).

  • How to assess the data quality of population based surveys?

    Population based surveys are prone to 2 main types of errors:

    JPEG - 136.2 kb
  • Training curriculum on survey data quality (1/2)

    This curriculum includes all necessary materials to facilitate the training: presentations, examples, facilitator’s guide, practical exercise and solutions, parts of real datasets. The curriculum is fully aligned with the recent UNICEF/WHO recommendations(1).

    The 3 days are organised as follows:

    • DAY 1: Survey methods, anthropometric measurements
    • DAY 2: Performing Data Quality tests, how to read a data quality assessment report
    • DAY 3: Make a decision on data quality level required for your analysis

    The curriculum is meant to be used by national technical assistants to facilitate the training of the NIPN team and its partners when relevant.

    Note that the curriculum is especially focused on anthropometric measurements, though certain tests can be applied to other continuous variables as well.

    Trainers are encouraged to:

    1. Download the 3 days curriculum materials provided in the next page
    2. Download the facilitator’s guide Word file
    3. Watch the webinar introducing the training curriculum
    4. Discuss with NIPN teams and decide if the training is relevant
    5. Adapt and facilitate the training when relevant

    To support the data quality tests discussed in the training curriculum, a statistical test package has been developed in R software: nipnTK.r
    This package includes built-in functions to generate z-score and perform data quality tests. It is recommended for users that have already a basic knowledge of R software.
    A ‘R’ user guide is also available in order to easily perform the tests mentioned in the training curriculum.

    *****

    Reference:

    1. Recommendations for data collection, analysis and reporting on anthropometric indicators in children under 5 years old. Geneva: World Health Organization and the United Nations Children’s Fund (UNICEF), 2019. Licence: CC BY-NC-SA 3.0 IGO.
  • Training curriculum on survey data quality (2/2)

    Day1/Jour1

    Presentations

    Additional material

    • Download additional material for day 1 package (ZIP file)

    Exercices

    • Download exercises for day 1 package (ZIP file)

    Day2/Jour2

    Presentations

    Additional material/matériel supplémentaire

    • Download additional material for day 2 package in English (ZIP file)
    • Télécharger le matériel supplémentaire pour le jour 2 en français (fichier ZIP)

    Exercices

    • Download exercises for day 2 package in english (ZIP file)
    • Télécharger les exercices pour le jour 2 en français (fichier ZIP)

    Day3/Jour3

    Presentations

    Exercices

    • Download exercises for day 3 package in English (ZIP file)
    • Télécharger les exercices pour le jour 3 en français (fichier ZIP)

    Annexes

    • Download Annexes package (ZIP file)
    • Download package for ’R’ software (ZIP file)