Why data definition is important

Atanu Majumdar
4 min readAug 21, 2021

--

At that time, I was a Polio Consultant to UNICEF Kolkata. UNICEF was responsible for leading the polio campaign in the state (West Bengal, India). During those days (2003–05) National Polio Surveillance Programme was a priority health programme of the Indian Government. As soon as a day for mass immunisation of all 0–5 years old children would be announced by the Government, the entire office would be on its toes. There were many refusal pockets throughout the state where parents would never bring their children to the immunisation camps.

The situation inside the office was quite tense as the morning headlines the next day would be all about what percentages of children had missed the vaccination in different districts. It was enough to make us feel guilty as social mobilisation was our responsibility. However, the story is about numbers and not polio. On one such occasion, the next day after a mass immunisation camp, I was called by my frustrated boss to receive an assignment: to estimate the numbers of 0–5 years old children district-wise.

I murmured: the Government had all these numbers based on which requisitions for vaccines and the newspaper headlines are made.

My boss: No, I want our own figures.

It was not an easy task as the number of 0–5 years old children was not reported in the census. I applied my statistical and modelling skills to get district-wise estimates of 0–5 years old children. But surprisingly, they were all smaller than the numbers of children reportedly vaccinated. My boss was not happy. He expected me to say that the proportion of unimmunised children was much less than reported. But he never imagined the estimates to be so different — the number of children vaccinated was 8–10% more than the estimated number of 0–5 years old children in every district. There was no doubt about the existence of unimmunised children because such children were identified and head counted by the front line workers.

My boss pronounced with a sullen face: There must be something wrong, my dear.

I was very confident about my logic, model and estimates. But what could you do when your boss did not appreciate the same? Most of the statisticians would agree that people are often interested only in results, not the methods by which they were derived. So I checked and checked, but every time I got the same numbers. I was depressed as my reputation as a statistician was at stake, and then I decided to probe: How did the Government estimate the numbers? I went from one office to another to finally reach the one who threw the numbers on the conveyor belt.

I: How did you estimate the number of 0–5 years old children for the mass immunisation programme?

He: Actually, we did not do it. One UN agency provided these figures to us.

My next destination was to that UN Agency office. Oh, they had an organised office, and my UNICEF identity took no time to bring me before the guy who did these calculations.

He was quick and smart: It’s easy. You know, the Indian census provides numbers of 0–6 years old children.

I: Yes.

He: Divide the number by six and then multiply by five. That’s how we do.

I was stamped. Yes, the Indian census indeed provides the numbers of 0–6 years old children. Although I did it differently, the figures should not be very different.

Next, I went to the Census Office and caught hold of the person who prepared the census tables.

She gave me a patient hearing and then started smiling at me: Two figures should never match.

I: Why?

She: In the census, we consider age on one’s LBD.

I: What is LBD?

She: LBD means Last Birth Day. One, who was 6 years 11 months 29 days old on the day of enumeration, was 6 years old on his/her LBD. Thus, 0–6 years old children in the census comprised all children less than 7 years old. In the polio programme, 0–5 years old children imply children below 5 years old. One, who is 5 years 1 day old, is not eligible for the polio vaccine.

I: Therefore, if you want to estimate the number of 0–5 years old children eligible for the polio vaccine from the census numbers of 0–6 years old children, you have to divide the census figures by seven (not by six) and then multiply it by five. Correct?

She: You have got it.

I rushed to my office to check the calculations. And it was a great relief. Figures were very close to my estimates.

The moral of the story: Different agencies may use different definitions to imply the same thing. The polio programme and the Indian Census work with varying definitions of age. Be careful.

--

--

Atanu Majumdar

Seasoned statistician. Life long mission: Matching theory with intuition.