$url = 'NIPN-Guidance-Notes?rubrique=83§ion=83&article=25'; redirect($url); Data analysis plan - NIPN

Data analysis plan

1. Why a data analysis plan?
A data analysis plan helps you think through the data you will collect, what you will use it for, and how you will analyse it. Analysis planning can be an invaluable investment of time” (Center for Disease Control and Prevention, 2013)
The method for creating a data analysis plan in the context of a NIPN is not much different from the method used in a research context.
In the context of NIPN, the process should be simpler because:

  • A data analysis framework is already produced (step 3 of question formulation process) and forms the basis for the more detailed data analysis plan (after step 4 of question formulation process).
  • In section 3.4/n°7-8-9 data analysis methodologies are described.
  • NIPN is about the use of existing data, it is not about designing a protocol for new data to be collected.

The next section describes briefly the content of a data analysis plan focusing on what is a bit specific to the NIPN.

As general recommendations:

  • Don’t panic!
  • Use the advice and experiences from colleagues and experts
  • Quickly contact an expert when necessary

Recommended sources to read:
Centers for Disease Control and Prevention (2013) Creating an analysis plan. Atlanta.
Simpson, S.H. Creating a data analysis plan: what to consider when choosing statistics for a study (2015).

2. What is a data analysis plan?
Main sections of a data analysis plan (based on CDC module):

  • Main question and sub-questions
  • Dataset(s) to be used
  • Inclusion/exclusion criteria
  • Variables to be used in the main analysis
  • Statistical methods and software to be used
  • Table shells
    => Estimation of time and resources needed

3. Main question and sub-questions
At this stage, the policy relevant question (and, in some cases, its sub-questions) is already well defined (section 3.4/n°11).
Answering all the sub-questions will provide a full answer to the main question.

4. Dataset(s) to be used
The dataset(s) needed is(are) listed. In the context of the NIPN, particular attention may be needed on data management: as the dataset(s) may come from different sources and/or may not have been designed for the main question, there could be quite some work to be done to harmonise / append / clean the raw dataset(s).

  • Are the datasets comparable?
  • Are the indicators harmonized?
  • Is there a need to transform the data for the analysis?
    To answer these question, you need to have accessed the datasets in question.

5. Inclusion/exclusion criteria
In this section population sub groups, geographic scope, timeframe… are very precisely defined.
You also need to clarify the data quality level required for the analysis.
Indeed, depending on the analysis, you may need to be more or less strict on data quality level required.
This is detailed in the Data Quality training module (section 3.3).

6. Variables to be used in the main analysis
In this section you define precisely variables/indicators to be used in the analysis.
For example, if you analyse “obesity”, you need to precise if you refer to the Body Mass Index (BMI) and if you are going to use different categories of BMI or the mean or both.
In the context of NIPN, the harmonization of the definition of indicators across datasets will be important.

7. Statistical methods and software to be used
Ensure coherence with section 4 of the guidance notes on data analysis.
Also, to provide only undisputable analysis (principle 3 section 3.4/n°4), make sure that the statistical method used is coherent with the datasets available and the data quality of these datasets. The choice of the statistical method is key to avoid overinterpretation of the data that could lead to misleading conclusions.
Does the NIPN team has the technical capacity to handle the statistical method and the software identified?

8. Table shells
Nothing specific to NIPN.

9. Estimation of time and resources
At this stage, a precise estimation of the time and resources needed to conduct the analysis should be made.
If this estimation lead to more time than the initial estimation made during the data analysis framework, you may adjust the question/s to be addressed first.