Data and the search for truth

Churning for truth from an ocean of data also demands statistical expertise and innovation.

Whenever I think of ‘data’, I think of Brent Spinner. of android star trek Self-aware, sensible, sensitive, and striving for his humanity. Today, ‘data’ is already ‘big’ and is constantly growing and has the potential to impact every part of the human lifestyle. However, “there is terror in the numbers,” as Darrell Hough wrote. how to lie with statistics. The task of statisticians is to brainstorm data and obtain summary measures, diagrams and figures, rankings and indices, and draw conclusions. Is this the much-awaited ‘Human Chip’ to make ‘Data’ human?

proper understanding of data

In fact, statisticians are often like parable blind people standing in front of an elephant. And insufficient or partial analysis of the data can lead to a misrepresentation of the elephant. As H.G. Wells put it: “Statistical thinking will one day be as essential to efficient citizenship as the ability to read and write.” Yes, it is very important to understand the meaning of statistical and probabilistic findings. This was exemplified in the case of Stephen Jay Gould, who explained how the statistics that peritoneal mesothelioma, the form of cancer with which he was diagnosed, had a “median survival time of eight months”, could explain the distribution of that data. Given is misleading, and relevant data about their individual forecasts. Gold showed a positive approach to overcoming obstacles. Some of the fighting spirit he proposed was the result of his proper understanding of statistics. For once, he argued, statistics reveal themselves as sources of optimism, rather than the sterile methodology that most people associate with the term.

Misleading statistics may arise due to limitations of the statisticians concerned, or it may also be intentional, or both. “Giving people false information using statistical material may be called statistical manipulation,” Huff wrote. Huff pointed to seven general tips for making statistical data ‘dough’, including a non-representative group, small sample sizes, and mean values ​​in non-uniform populations. Huff illustrated how statistical graphs can be used to distort reality. If the bottom of a line or bar chart is truncated, the differences appear larger than they are. In addition, the ratio between the coordinate and the abscissa is sometimes changed for this purpose. With the help of several anecdotal examples, Huff also discusses the ‘post-hoc fallacy’, which erroneously claims a direct relationship between the two findings. In his 2001 book, Cursed Lies and StatisticsIn , Joel Best also used fascinating examples from major newspapers and television programs to highlight the use, misuse and abuse of statistical information.

The goal of statistics is to discover the ‘truth’ amidst the randomness of nature. “Uncertain knowledge + knowledge of the amount of uncertainty therein = usable knowledge,” wrote C.R. Rao in his book Statistics and truth: giving a chance to work. Pro. Rao discussed how the data could be used to determine whether a new poem by Shakespeare has been composed or to test for certain rare diseases in order to reduce the number of tests performed by different individuals. Blood samples are to be mixed together.

need for innovation

Churning for truth from an ocean of data sometimes demands superior statistical expertise. It also requires innovation. During the post-independence communal riots in Delhi, many people from the minority community took refuge in the Red Fort and some in Humayun’s Tomb. The government had no exact number of refugees, and the contractors responsible for feeding them charged high amounts. A team from the Indian Statistical Institute was asked to estimate the number. They estimated the number of individuals inside a given area without having the opportunity to observe the concentration of individuals within that area and without using any known sampling techniques for estimation or census methods. In fact, based on an idea suggested by JM Sengupta, he divided the amount of rice, pulses and salt to feed all the refugees, as cited by contractors, into rice, pulses and salt known from consumption surveys. , and received three widely differing estimates of the number of refugees. The salt estimate was the smallest and the rice estimate was the largest. Since rice was the most expensive, its quantity was probably overstated. He proposed the quantity obtained from salt as an estimate of the number of refugees. The method was verified to provide a good approximation in Humayun’s tomb.

The lesson is clear. To extract ‘truth’ using statistics, requires expertise and innovation from the concerned statisticians. Ideal statistical thinking and a proper understanding of statistics for laymen, of course, are no less important. A pinch of salt is really needed.

Atanu Biswas is Professor of Statistics at the Indian Statistical Institute, Kolkata.


Leave a Reply