Data landscape

  • Objectives of a data landscape exercise

    The main objective when conducting a data landscape exercise is to provide an overview of the availability, accessibility and quality of indicators that are of interest to the NIPN.
    The exact nature of the exercise will vary between countries, depending upon the range of indicators and data sources available, time constraints, and the priorities of the NIPN country team.
    The exercise typically answers (fully or partially) the following questions:

    • What are the existing information systems of interest to the NIPN? Which datasets and indicators are included? Are there important that are not available? (see definitions below)
    • What sorts of information are contained in the datasets? Is the information representative at the sub-national level? How regularly is the data collected? Which data quality control mechanism is applied? How has the data been collected?
    • Where are the datasets? Which institutions have the mandate to collect and manage the datasets? How can the NIPN obtain legal access to the datasets?
      The main output of this exercise is a Data Landscape Report (this section, pages 3 to 6).
    *****
    Resources to consider before starting a data landscape exercise
    There are two resources that are important for all NIPN teams to be familiar with before starting the data landscape exercise:
    1. Typically, all NIPN countries have a National Statistical Plan that describes the statistical environment and national priorities.
    2. The Scaling Up Nutrition (SUN) Movement has produced Nutrition Information Mapping country reports and case studies which are a good starting point for the data landscape exercise but do not suffice.
  • *****
    Definitions
  • The use of a data landscape exercise

    The data landscape exercise can be used for a number of purposes:

    1. To establish which data is available, accessible and of sufficient quality to respond to a relevant nutrition policy question

    The data landscape exercise is conducted as one of the first activities when an NIPN is being set up in a country and prior to the identification of policy-relevant questions (section 2.2).
    The data landscape provides an initial picture of the available and accessible data and its quality. It helps the NIPN data experts to quickly ascertain which survey instruments have been used to collect nutrition-relevant indicators and which institutions or individuals to contact to access the data. The landscape exercise will also contribute to building and maintaining close connections with the multi-sectoral data providers and facilitate access to data in the future.
    The initial data landscape exercise will be used during the process of formulating policy questions to provide first-hand basic information for the NIPN data experts to decide whether a formulated policy question can be answered or not (section 2.4).
    Yet, it is likely that some further investigation may be required to reach a final decision on whether policy questions can be answered as the data landscape exercise will never be able to cover 100% of all available data:

    • Policy questions can be very diverse and may require investigation of indicators that have not been included in the exercise.
    • To effectively assess the data quality, the dataset needs to be manipulated, which cannot be done during the data landscape exercise.
    *****

    2. To initiate a process to progressively update the data landscape

    The data landscape exercise should be a dynamic process. The initial exercise provides an initial picture of the data landscape. As new surveys are conducted, the data landscape can be updated. While the NIPN approach is developing and maturing within a country and as more and more policy questions are being formulated and answered, the data landscape will expand and provide a more complete picture.

    *****

    3. To provide actionable recommendations to improve the nutrition information system

    The data landscape may offer new insights into the structure and functioning of the nutrition information systems. In particular, it may:

    • Identify and highlight gaps in data availability and lack of capacity to collect nutrition-relevant indicators, which could lead to recommendations on how to fill those gaps.
    • Highlight lack of harmonisation of indicators collected by the various instruments or systems. For example, the sampling method or the geographical unit may vary between information systems. The data landscape exercise can identify these differences and advocate for harmonisation or clarify (in-)comparability of indicators.
    *****

    4. To provide input into the NIPN data management strategy

    One of the aims of NIPN is to build a central repository of multi-sectoral datasets or to support an existing repository. The data landscape exercise will provide key information regarding the existing information systems, how they communicate with each other, and where to locate required indicators. Such information is important for the design of an appropriate NIPN central repository.

  • The preparation of a data landscape report (1/3)

    A data landscape report will need to address the context-specific needs of a country. This guidance sets out six general sections that may typically be included in the report.
    It is important to note that NIPN should not reinvent the wheel, but rather build on what already exists. Some sections of the report may already be fully or partially completed for another purpose and therefore do not need to be replicated.
    The data landscape report may typically contain the following sections:

    1. General description of the national statistical system
    2. Data providers mapping
    3. Description of the main information systems and datasets available
    4. Indicators matrix
    5. Operational recommendations
    6. The way forward
    *****

    1. General description of the national statistical system

    The national statistical system (NSS) is the combination of statistical organisations and units within a country that jointly collect, process and disseminate official statistics on behalf of a national government (see the Paris 21 website). This section of the report addresses the following questions:

    • Which key institutions are involved in nutrition-related data collection, management and dissemination?
    • What is their mandate?
    • How are they organised?
    • Is data sharing taking place? How easy and efficient is data sharing?
    • What are the legal and policy frameworks for data sharing?
    • What are the priorities of the national strategy for the development of statistics?
    • What are the main bottlenecks?
    *****

    2. Data providers mapping

    This a description of the main organisations that manage the information systems (data providers), the main datasets they manage, their objectives and their capacity (see below for an en example of a data provider mapping).
    Typical data providers are:

    • M&E or Statistics Divisions of line ministries
    • National Statistical Offices
    • Organisations that manage early warning systems
      Note that the data providers can be at either central level or decentralized level.
    *****
    Example of data provider mapping
    JPEG - 84.9 kb
  • The preparation of a data landscape report (2/3)

    3. Description of the main information systems and datasets available

    This is a description of the main information systems with nutrition-relevant information. It addresses questions such as:

    • How is data collection organised?
    • What systematic data quality control mechanisms are applied?
    • Which indicators are collected?
    • What are the sampling methods, exact dates of data collection, and population groups covered by the data?
    • What is the procedure to access the datasets?
    • Which outputs are produced on the basis of this data? For whom and how are they used?
      One data provider can manage one or more information systems (e.g. a National Statistical Office manages the Demographic and Health Survey, as well as SMART Surveys).

    Typical information systems include: Education Management Information System, Health Information Routine Data (DHIS2), Water Management Information System, Household Income and Consumption Surveys, Health and Nutrition Expenditure Surveys, Demographic and Health Surveys (DHS), Multiple Indicator Cluster Surveys (MICS), SMART Nutrition Surveys, and Integrated Food Security Phase Classification Database (IPC)…

    *****

    4. Indicator matrix

    Ideally, an Excel file is populated with information for all the indicators included in the data landscape (see below for an example of an indicator matrix).
    The typical information collected on each indicator includes:

    • Name of the dataset(s) to which the indicator is attached
    • Definition of the indicator (variables used to generate the indicator)
    • Period of data collection
    • Year when data was collected
    • Data collection sampling method (e.g. survey, routine)
    • Geographic coverage
    • Organisation(s) that collect and manage the data

    Note that indicators from different datasets may not be directly comparable (Challenge n°3, this section, page 8).

    *****
    Example of indicator matrix
    The indicator matrix in an Excel file (as in the example provided below based on Niger’s experience) is an ideal output of a data landscape exercise. It describes in which datasets you can find the indicator you are looking for to answer the policy-relevant question identified. However, as detailed in Challenge n°1 (this section, page 6), describing all the indicators available in all the datasets of interest to the NIPN can be a very vast piece of work. Ivory Coast, for example, decided to describe the information systems and the datasets of interest to the NIPN without describing precisely the indicators available in these datasets, and did not develop an indicator matrix.
    Download an example of indicator matrix in Excel format (based on the experience of Niger).
  • The preparation of a data landscape report (3/3)

    5. Operational recommendations

    The recommendations will pertain to questions such as the following:

    • Which improvements are needed to make the nutrition information system more functional?
    • Which inputs are required to achieve the above?
    • How will NIPN use the results of this exercise compiled in the data landscape report?
    • In what way can the data landscape exercise contribute to the design of the NIPN central repository?
    *****

    6. The way forward

    The final section of the report depicts the way forward and typically addresses the following questions:

    • How does NIPN plan to regularly and continuously complete and update the data landscape?
    • Which sectors and indicators need further investigation?
    • Which activities and resources are planned to achieve this?
  • The challenges and how to overcome them (1/4)

    Engaging in a data landscape exercise is not without challenges. This section highlights four key challenges that NIPN country teams face when designing and implementing data landscape exercises, and proposes pragmatic solutions.

    • Challenge n°1: The scope of the data landscape exercise
    • Challenge n°2: Access to datasets for the data landscape exercise
    • Challenge n°3: Harmonisation of indicators for the data landscape exercise
    • Challenge n°4: Cost, time and resources for the data landscape exercise
    *****

    Challenge n°1: The scope of the data landscape exercise

    Although the data landscape exercise could be executed at central or decentralized level, the exercise is vast enough at central level. Therefore we recommend focusing on the central level. Countries with a highly decentralized system or with a specific focus on one district may want to expand the exercise to the decentralized level (topic not covered here). The domains covered by NIPN for which data is required are the following:

    • Nutrition outcomes
    • Basic, underlying and immediate determinants of nutrition
    • Nutrition-specific and nutrition-sensitive interventions / programmes
    • Finance for nutrition

    ‘Finance for nutrition’ data is included here because analysing which investments or budget are allocated to which activities is a crucial element in policy decision-making (refer to the SUN Budget Analysis for Nutrition).

    The complete list of relevant datasets and indicators (this section, page 4) is potentially very vast and probably too big for the scope of a short-term exercise. It is therefore essential to narrow down the scope of the exercise to keep it feasible. However, each country will need to decide on the scope of their data landscape exercise as there is no “one-size-fits-all” solution.
    Two options are described here:

    • Option 1: Limit the exercise to the level of the datasets (exclude the indicator matrix)
    • Option 2: Include datasets and indicators matrix in the data landscape exercise
      (see below for more information on options 1 and 2).

    Creating the indicators matrix is indeed time-consuming and resource-demanding, but can be of particular interest when:

    • NIPN teams want to identify a list of key nutrition indicators to follow (case of Niger);
    • A list of key indicators for nutrition has been set by nutrition policy documents (case of Guatemala);
    • Time and resources are available: the indicators matrix can be a very practical tool to quickly ascertain where to find specific indicators to answer a specific nutrition policy-relevant question.
    *****
    How to overcome challenge n°1
  • The challenges and how to overcome them (2/4)

    Challenge n°2: Access to raw datasets for the data landscape exercise

    Data is becoming more public and transparent. Web-based platforms, such as DHS STATcompiler, UN data, DEVINFO, DHIS2, and the NADA repository, are now more commonly used. Summary statistics are also frequently published in statistical books, survey reports and web platforms. Obtaining access to the raw datasets can still be problematic and time-consuming because:

    • Data must be anonymised to be shared due to ethical considerations.
    • Some institutions can understandably still be reluctant to share sensitive data in the absence of a legal framework for data sharing.

    Although is not necessary to have access to the raw datasets to complete the data landscape exercise, the exercise will need to assess the practical accessibility of the datasets by:

    • Inquiring about the formal procedural steps to obtain official permission and access the data.
    • Interviewing external users on their experience of accessing the data.
    • Making an actual attempt to access the database by:
      • Downloading and opening databases that are formally available on the web to establish whether: an access code is required, all indicators are available and the data is anonymised.
      • Requesting access from the data providers. Even if it is not possible to access a particular dataset, it is helpful to know where datasets are stored, and the process and information required for access. It is also important to establish whether reports are available that describe the sampling method, data quality controls and results.

    Efforts to establish the accessibility of datasets can be time-consuming. In an ideal situation, a legal framework for data sharing exists. In the absence of such a framework, the NIPN country team should advocate for it but will at the same time need to find a pragmatic solution for accessing data. Based on past experience, the following factors may contribute to facilitating access to multi-sectoral datasets:

    • Develop relationships with data providers: Access to datasets can depend on individual relationships based on trust. The data landscape exercise is a good way of identifying key data providers, building relationships and raising their awareness of the NIPN. This includes an explanation of what the NIPN intends to do with the data, and understanding their concerns and the information and authorisation needed to access data.
    • Involve data providers in the NIPN: Experience from the NEP project showed that having key data providers as members of the technical committee was an efficient means of accessing datasets.
    • Coordinate with data providers: The NIPN data experts should not work in a silo but rather involve data providers in the interpretation and communication of the data. The NIPN outputs should be beneficial to data providers. For example, in Ivory Coast the Prime Minister provided the NIPN team with an official letter addressed to data providers to grant the NIPN systematic access to the datasets.
    • For the updating of a central repository (based on the NADA software solution) the National Statistical Office of Burkina Faso organised a one week workshop with focal points from the different ministries in 2012. Each focal point would come with their datasets to be uploaded. Discussion on harmonisation of indicators formed the basis for an action plan. The workshop created a dynamic forum that facilitated the sharing of data instead of needing to request datasets from each and every stakeholder.
  • The challenges and how to overcome them (3/4)

    Challenge n°3: Harmonisation of indicators for the data landscape exercise

    Different survey instruments can collect data on the same indicator. For example, stunting is typically collected by DHS, MICS, NNS, local SMART surveys, and routine data from health centres. Note that the indicators are not necessarily directly comparable because:

    • The definition of the indicator can be different. For example, stunting prevalence is measured in children aged 0-59 months in DHS and MICS while it is measured in children aged 6-59 months in SMART surveys.
    • For survey data, the sampling frame is important. Sub-national data from a survey data that is designed to be representative at the national level may not be directly comparable to sub-national data from a survey that is designed to be representative of the sub-national level.
    • The geographic level may vary if administrative demarcation has changed over time.
    • Routine data and survey data, even when using the same indicators, cannot be directly combined. Population-based survey data is designed to be representative of a population group while routine data is representative of the individuals using a service or programme.
      The question of harmonisation of indicators is addressed at the data analysis stage. However, when conducting a data landscape exercise, it is important to include information such as indicator definitions, routine vs survey data, and geographic scope in the indicators matrix to identify challenges ahead.
  • The challenges and how to overcome them (4/4)

    Challenge n°4: Cost, time and resources for the data landscape exercise

    The cost, time and resources required to conduct a data landscape exercise will vary from country to country depending on the context, objectives and scope of the exercise.
    For example, in Uganda the Bureau of Statistics already had a national statistical metadata dictionary with a list of all the indicators collected and sources of information that could be used. In Niger, however, there was no such official multi-sectoral M&E plan that would help to select the indicators to include in the data landscape.
    One of the time-consuming tasks is meeting with every organisation, identifying the right person and obtaining the relevant information. Two or three visits to each institution are typically needed to obtain all the information required.
    The NIPN country team may conduct the exercise themselves or bring in a consultant to do it. For example, in Burkina Faso the NIPN country team conducted the exercise over a 2-month period (2 part-time staff). In Ivory Coast and Niger, however, short-term consultants were contracted for 40 days and 30 days respectively.
    If choosing the option of a consultant:

    • It is recommended to contract a consultant who is very familiar with the information systems in country and who knows how to navigate the government system. Statistical skills are not essential here.
    • It is also recommended to spread the days of work over a longer period (e.g. 30 days over 3 months) to account for the time needed to receive information.

    Choosing the option of the NIPN teams conducting the data landscape exercise has important added value: all the connections made with data providers during the exercise are really important to maintain within the team for the success of the NIPN project. The data landscape exercise is an excellent opportunity to start building these relationships.

    In all cases, Terms of Reference must detail:

    • the objectives (this section, page 1)
    • the expected outputs (this section, page 2)
    • the human and financial resources needed
    *****

    Read the interview with the Niger team below to understand how they have overcome some of the challenges faced: