Data landscape

  • Objectives of a data landscape exercise

    The main objective when conducting a data landscape exercise is to provide an overview of the availability, accessibility and quality of indicators that are of interest to the National Information Platform for Nutrition (NIPN).
    The exact nature of the exercise will vary between countries, depending upon the range of indicators and data sources available, time constraints, and the priorities of the NIPN country team.
    The exercise typically answers (fully or partially) the following questions:

    • What are the existing information systems of interest to the NIPN? Which datasets and indicators are included? Are there important indicators that are not available? (see definitions below)
    • What type of information is contained in the datasets? Is the information representative at the sub-national level? How frequent is the data collected? Which data quality control mechanism is applied? How has the data been collected?
    • Where are the datasets? Which institutions have the mandate to collect and manage the datasets? How can the NIPN obtain legal access to the datasets?
      The main output of this exercise is a Data Landscape Report (this section, pages 3 to 6).
    *****
    Resources to consider before starting a data landscape exercise
    There are two resources that are important for all NIPN teams to be familiar with before starting the data landscape exercise:
    1. Typically, all countries have a National Statistical Plan that describes the statistical environment and national priorities.
    2. The Scaling Up Nutrition (SUN) Movement has produced Nutrition Information Mapping country reports and case studies which are a good starting point for the data landscape exercise but may not be sufficient.
  • *****
    Definitions
  • The use of a data landscape exercise

    The data landscape exercise can be used for a number of purposes:

    1. To establish which data is available, accessible and of sufficient quality to respond to a nutrition policy-relevant question

    The data landscape exercise is conducted as one of the first activities when an information platform is being set up in a country and prior to the identification of policy-relevant questions (section 2.2).
    The data landscape provides an initial picture of the available and accessible data and their quality. It helps the data experts to quickly ascertain which survey instruments have been used to collect nutrition-relevant indicators and which institutions or individuals to contact to access the data. The landscape exercise will also contribute to building and maintaining close connections with the data providers of the various sectors and facilitate access to data in the future.

    The initial data landscape exercise will be used during the process of formulating policy questions to provide first-hand basic information to the data experts to decide whether a formulated policy question can be answered or not with the existing data (section 2.4).
    Yet, it is likely that some further investigation may be required to reach a final decision on whether policy questions can be answered as the data landscape exercise will never be able to cover 100% of all available data:

    • Policy questions can be very diverse and may require investigation of indicators that have not been included in the exercise.
    • To effectively assess the data quality, the dataset needs to be manipulated, which cannot be done during the data landscape exercise.
    *****

    2. To initiate a process to progressively update the data landscape

    The data landscape exercise should be a dynamic process. The initial exercise provides an initial picture of the data landscape. As new surveys are conducted all the time , the data landscape will need to be updated. While the national information platform for nutrition is expanding its work within a country and progressively more policy questions are being formulated and answered, the data landscape will be expanded and provide a more complete picture.

    *****

    3. To provide actionable recommendations to improve the nutrition information system

    The data landscape may offer new insights into the structure and functioning of the nutrition information systems. In particular, it may:

    • Identify and highlight gaps in data availability and lack of capacity to collect nutrition-relevant indicators, which could lead to recommendations on how to fill those gaps.
    • Highlight lack of harmonisation of indicators collected by the various instruments or systems. For example, the sampling method or the geographical unit may vary between information systems. The data landscape exercise can identify these differences and advocate for harmonisation or clarify (in-)comparability of indicators.
    *****

    4. To provide input into the NIPN data management strategy

    One of the aims of NIPN is to build a central repository of multi-sectoral datasets or to support an existing repository. The data landscape exercise will provide key information regarding the existing information systems, how they communicate with each other, and where to locate required indicators. Such information is important for the design of an adequate central repository.

    *****
  • The preparation of a data landscape report (1/3)

    A data landscape report will need to address the context-specific needs of a country. This guidance sets out six general sections that may typically be included in the report.
    It is important to not reinvent the wheel, but rather build on what already exists. Some sections of the report may already be fully or partially completed for another purpose and therefore do not need to be replicated.
    The data landscape report may typically contain the following sections:

    1. General description of the national statistical system
    2. Data providers mapping
    3. Description of the main information systems and datasets available
    4. Indicators matrix
    5. Operational recommendations
    6. The way forward
    *****

    1. General description of the national statistical system

    The national statistical system (NSS) is the combination of statistical organisations and units within a country that jointly collect, process and disseminate official statistics on behalf of a national government (see the Paris 21 website). This section of the report addresses the following questions:

    • Which key institutions are involved in nutrition-related data collection, management and dissemination?
    • What is their mandate?
    • How are they organised?
    • Is data sharing taking place? How easy and efficient is data sharing?
    • What are the legal and policy frameworks for data sharing?
    • What are the priorities of the national strategy for the development of statistics?
    • What are the main bottlenecks?
    *****

    2. Data providers mapping

    This a description of the main organisations that manage the information systems (data providers), the main datasets they manage, their objectives and their capacity (see below for an en example of a data provider mapping).
    Typical data providers are:

    • M&E or Statistics Divisions of line ministries
    • National Statistical Offices
    • Organisations that manage early warning systems
      Note that the data providers can be at either central level or decentralised level.
    *****
    Example of data provider mapping
    JPEG - 84.9 kb
    *****
  • The preparation of a data landscape report (2/3)

    3. Description of the main information systems and datasets available

    This is a description of the main information systems with nutrition-relevant information. It addresses questions such as:

    • How is data collection organised?
    • What systematic data quality control mechanisms are applied?
    • Which indicators are collected?
    • What are the sampling methods, exact dates of data collection, and population groups covered by the data?
    • What is the procedure to access the datasets?
    • Which outputs are produced on the basis of this data? For whom and how are they used?
      One data provider can manage one or more information systems (e.g. a National Statistical Office manages the Demographic and Health Survey, as well as SMART Surveys).

    Typical information systems include: Education Management Information System, Health Information Routine Data (DHIS2), Water Management Information System, Household Income and Consumption Surveys, Health and Nutrition Expenditure Surveys, Demographic and Health Surveys (DHS), Multiple Indicator Cluster Surveys (MICS), SMART Nutrition Surveys, and Integrated Food Security Phase Classification Database (IPC)…

    *****

    4. Indicator matrix

    Ideally, an Excel file is populated with information for all the indicators included in the data landscape (see below for an example of an indicator matrix).
    The typical information collected on each indicator includes:

    • Name of the dataset(s) to which the indicator is attached
    • Definition of the indicator (variables used to generate the indicator)
    • Period of data collection
    • Year of data collection
    • Data collection sampling method (e.g. survey, routine)
    • Geographical coverage
    • Organisation(s) that collect and manage the data

    Note that indicators from different datasets may not be directly comparable (Challenge n°3, this section, page 8).

    *****
    Example of indicator matrix
    The indicator matrix in an Excel file (as in the example provided below based on the experience of Niger) is an ideal output of a data landscape exercise. It describes which datasets contain for instance an indicator identified to answer a policy-relevant question. However, as detailed in Challenge n°1 (this section, page 6), describing all indicators available in all datasets that are of interest to NIPN can be a massive piece of work, which is not always necessary. Côte d’Ivoire, for example, decided to describe the information systems and datasets of interest to NIPN without describing precisely all indicators available in these datasets, and did not develop an indicator matrix.
    Download an example of indicator matrix in Excel format (based on the experience of Niger).
    *****
  • The preparation of a data landscape report (3/3)

    5. Operational recommendations

    The recommendations will pertain to questions such as the following:

    • How to make the nutrition information system more functional?
    • Which inputs are required to achieve the above?
    • How will NIPN use the results of this exercise compiled in the data landscape report?
    • In what way can the data landscape exercise contribute to the design of the central repository?
    *****

    6. The way forward

    The final section of the report depicts the way forward and typically addresses the following questions:

    • How does NIPN plan to regularly and continuously complete and update the data landscape?
    • Which sectors and indicators need further investigation?
    • Which activities and resources are planned to achieve this?
    *****

    7. Examples of data landscape reports

    *****
  • The challenges and how to overcome them (1/4)

    Engaging in a data landscape exercise is not without challenges. This section highlights four key challenges that country teams may face when designing and carrying out a data landscape exercise and it proposes pragmatic solutions for each of the challenges.

    • Challenge n°1: The scope of the data landscape exercise
    • Challenge n°2: Access to datasets for the data landscape exercise
    • Challenge n°3: Harmonisation of indicators for the data landscape exercise
    • Challenge n°4: Cost, time and resources for the data landscape exercise
    *****

    Challenge n°1: The scope of the data landscape exercise

    Although the data landscape exercise could be executed at central or decentralised level, the exercise is vast enough at central level. Therefore it is recommended to limit the focus at the central level. Countries with a highly decentralised system or with a specific interest in one district may want to expand the exercise to the decentralised level (topic not covered here). The domains for which data is required are the following:

    • Nutrition outcomes
    • Basic, underlying and immediate determinants of nutrition
    • Nutrition-specific and nutrition-sensitive interventions / programmes
    • Finance for nutrition

    ‘Finance for nutrition’ data is included here because analysing which investments or budget are allocated to which activities is the first element of the nutrition impact pathway (see section 2.3) and a crucial element in policy decision-making (see the SUN Budget Analysis for Nutrition).

    The complete list of relevant datasets and indicators (this section, page 4) is potentially very vast and probably too big for the scope of a short-term exercise. It is recommended to narrow down the scope of this exercise to keep it feasible. However, each country will need to decide on the scope of their data landscape exercise as there is no “one-size-fits-all” solution.
    Two options are described here:

    • Option 1: Limit the exercise to the level of the datasets (exclude the indicator matrix)
    • Option 2: Include datasets and indicators matrix in the data landscape exercise
      (see below for more information on options 1 and 2).

    Creating the indicators matrix is indeed time-consuming and resource-demanding, but can be of particular interest when:

    • teams want to identify a list of key nutrition indicators to monitor (case of Niger);
    • A list of key indicators for nutrition has been set in the main nutrition policy documents (case of Guatemala);
    • Time and resources are available: the indicators matrix can be a practical tool to quickly ascertain where to find specific indicators to answer a specific nutrition policy-relevant question.
    *****
    How to overcome challenge n°1
  • The challenges and how to overcome them (2/4)

    Challenge n°2: Access to raw datasets for the data landscape exercise

    Data is becoming more public and transparent. Web-based platforms, such as DHS STATcompiler, UN data, DEVINFO, DHIS2, and the NADA repository, are now more commonly used. Summary statistics are also frequently published in statistical books, survey reports and web platforms. However obtaining access to the raw datasets can be problematic and time-consuming because:

    • Data must be anonymised to be shared due to ethical considerations.
    • Some institutions can understandably still be reluctant to share sensitive data in the absence of a legal framework for data sharing.

    Although is not necessary to have access to the raw datasets to complete the data landscape exercise, the exercise will need to assess the practical accessibility of the datasets by:

    • Inquiring about the formal procedural steps to obtain official permission and access the data.
    • Interviewing external users on their experience of accessing the data.
    • Making an actual attempt to access the database by:
      • Downloading and opening databases that are formally available on the web to establish whether: an access code is required, all indicators are available and the data is anonymised.
      • Requesting access from the data providers. Even if it is not possible to access a particular dataset, it is helpful to know where datasets are stored, and the process and information required for access. It is also important to establish whether reports are available that describe the sampling method, data quality controls and results.

    Efforts to establish the accessibility of datasets can be time-consuming. In an ideal situation, a legal framework for data sharing exists. In the absence of such a framework, the NIPN country team should advocate for it but will at the same time need to find a pragmatic solution for accessing data. Based on past experience, the following factors may contribute to facilitating access to datasets of multiple sectors:

    • Develop relationships with data providers: Access to datasets can depend on individual relationships based on trust. The data landscape exercise is a good way of identifying key data providers, building relationships and raising their awareness and understanding of the National Information Platform for Nutrition. This includes an explanation of what NIPN intends to do with the data, and addressing the concerns of data providers and understanding the information and authorisation needed to access data.
    • Engage data providers in NIPN: Experience from the Nutrition Evaluation Platforms project showed that having key data providers as members of the technical committee was an efficient means of accessing datasets.
    • Coordinate with data providers: The NIPN data experts should not work in silo but rather involve data providers in the interpretation and communication of the data. The outputs of the data analyses should be beneficial to data providers as well. For example, in Côte d’Ivoire the Prime Minister provided the NIPN team with an official letter addressed to data providers to grant the NIPN systematic access to the datasets.
    • For the updating of a central repository (based on the NADA software solution) the National Statistical Office of Burkina Faso organised a one week workshop with focal points from the different ministries in 2012. Each focal point would bring their datasets to be uploaded. Discussion on harmonisation of indicators formed the basis of a plan of action. The workshop created a dynamic forum that facilitated the sharing of data instead of needing to request datasets from each and every stakeholder.
    *****
  • The challenges and how to overcome them (3/4)

    Challenge n°3: Harmonisation of indicators for the data landscape exercise

    Different survey instruments may collect data on the same indicator. For example, stunting is typically collected by DHS, MICS, NNS, local SMART surveys, and routine data from health centres. Be aware of the fact that the indicators are not necessarily directly comparable across the surveys because:

    • The definition of the indicator can be different. For example, stunting prevalence is measured in children aged 0-59 months in DHS and MICS while it is measured in children aged 6-59 months in SMART surveys.
    • For survey data, the sampling frame is important. Sub-national data from a survey data that is designed to be representative at the national level may not be directly comparable to sub-national data from a survey that is designed to be representative of the sub-national level.
    • The geographic level may vary if administrative demarcation has changed over time.
    • Routine data and survey data, even when using the same indicators, cannot be directly combined. Population-based survey data is designed to be representative of a population group while routine data is representative of the individuals using a service or programme.
      The question of harmonisation of indicators is addressed at the data analysis stage. However, when conducting a data landscape exercise, it is important to include information on indicator definitions, routine vs survey data, and geographic scope in the indicators matrix to identify challenges ahead.
    *****
  • The challenges and how to overcome them (4/4)

    Challenge n°4: Cost, time and resources for the data landscape exercise

    The cost, time and resources required to conduct a data landscape exercise will vary from country to country depending on the context, objectives and scope of the exercise.
    For example, in Uganda the Bureau of Statistics already had a national statistical metadata dictionary with a list of all indicators collected and sources of information that could be used. In Niger, however, there was no such official multisectoral M&E plan that would help to select the indicators to include in the data landscape.

    One of the time-consuming tasks is meeting with every organisation, identifying the right person and obtaining the relevant information. Two or three visits to each institution are typically needed to obtain all the information required.
    The NIPN country team may conduct the exercise themselves or bring in a consultant to do it. For example, in Burkina Faso the NIPN country team conducted the exercise over a 2-month period (2 part-time staff). In Côte d’Ivoire and Niger, however, short-term consultants were contracted for 40 days and 30 days respectively.
    If choosing the option of a consultant:

    • It is recommended to contract a consultant who is very familiar with the information systems in country and who knows how to navigate the government system. Statistical skills are not essential here.
    • It is also recommended to spread the days of work over a longer period (e.g. 30 days over 3 months) to account for the time needed to receive information.

    There is one important advantage to the option of NIPN teams conducting the data landscape exercise themselves: the connections and relationships that are established with the data providers during the landscape exercise are extremely important for future engagement (data sharing, data analysis) and ultimately for the success of information platform. The data landscape exercise is an excellent opportunity to start building these relationships.

    In all cases, Terms of Reference must detail:

    • the objectives (this section, page 1)
    • the expected outputs (this section, page 2)
    • the human and financial resources needed
    *****

    Read the interview with the Niger team below to understand how they have overcome some of the challenges faced: