Why pre-poll surveys are misleading rather than indicative of an election: the Kerala example

Kerala’s obsession with pre-polls surveys emerged with the arrival of social media, and while Keralites love to juggle these numbers and percentages, this data is not what we thought they were

In 2019, a Malayalam leading news channel was vigorously discussing an exit poll survey in regards to the Palakkad constituency for the General Assembly elections. With VK Sreekandan (UDF), MB Rajesh (LDF), and C Krishnakumar (NDA) in the fray, the exit polls began to claim that MB Rajesh would win the seat with a thumping majority, or as the newsreader described as “a dominating victory for LDF” and an “embarrassing defeat for UDF”.

All major opinion polls called Palakkad for the LDF with the largest victory margin being 24%, and another aspect they focussed on was that BJP would get more votes than Congress, pushing the latter to the third. It was fun, it was interesting, and as much as all of us would love to sit and chat on that, pre-poll surveys are “more art than science”. When the results came, UDF won Palakkad, BJP was pushed to third with less than a quarter of vote share, and MB Rajesh didn’t have his “thumping victory”. Another prime example from the same election would be from the Thiruvananthapuram constituency.

While nobody thought LDF’s C Divakaran would win, the majority began to shout from the rooftops that BJP would make history with the first seat from Kerala. They claimed that Kummanam Rajasekharan would lead the BJP charge in Kerala by winning from Thiruvananthapuram. It was nice while it lasted. Congress’ Shashi Tharoor not only won by a majority of around one lakh votes, but he also increased his vote share by 7% to 41% in the 2019 elections. Shashi dominated Kummanam in every constituency, except Nemom, where the BJP only had some 12,000-vote upper hand. Interestingly, despite an increase of 240,000 votes and the CPM maintaining a similar number from 2014, the BJP only had an increase of barely 40,000 votes. How and where did the polls go wrong?

The Design

Survey polls in essence are opinion research, that is in a diluted sense, it is taking the opinion of an individual to determine or apply to a larger demographic. Herein lies the first problem. One individual cannot represent a larger demographic, and so survey designs are planned to include a multitude of samples from various socio-demographic categories. The samples (the people included in the study) could represent the population (the overall items included in the study). In a statistical study, the population doesn’t mean the population we use in a general sense.

Here is a very hypothetical example. Imagine there are 10,000 people in a district. This is the population of the district in the general sense. I want to study the drinking habits of people between the age group of 18-25, which would come about to 22,000 — this number being the population of my study. Since asking 22,000 people about their habits is a long and tedious process, I determine that a sample from my study population would be a more efficient method. Therefore, according to my necessary accuracy levels and complicated confidence level calculations, I decide on a number that could represent my study population. Imagine I want a sample at a 99% confidence level with a 1% margin of error — the highest accuracy possible — for my population sample would be around 9,500. This should include all socio-demographics for a proper assessment.

Hoping that the example is understood, let us look at the problem at hand here. Most industry surveys prefer to calculate based on a 95% confidence level and 1% margin of error meaning that for Kerala — with a voter population of 2.67 crore — the sample size is just around 10,000. Meaning that most surveys barely cross the mark, but that’s not all. The situation is complicated when we take into account the fact that if the sample not placed accordingly to different population parameters in a singular constituency then it could be seen as biased or one-sided. For example, if the survey sample was conducted just in Nemom — considering the 2019 general assembly election — then the survey could have predicted that Kummanam would have a thumping victory, and if it was in Neyyatinkara then Shashi Tharoor could have the advantage. So splitting the vote between constituencies is essential.

On the other hand, when the entire state is going for election in 140 constituencies, then the question is will they consider voters in each constituency or constituencies as a sample. The latter is often true. Most surveys consider “key constituencies” in the study, meaning that the samples are often limited to particular constituencies rather than the whole. Imagine this; if the samples were centred on Dharmadom and Kannur or Thodupuzha and Kaduthuruthy, then the results would be heavily biased. A combination also wouldn’t be statistically significant. The only counter to that would be the argument of changing perceptions.

Perception Machines

Changing perception is the second and most prominent problem surrounding pre-poll surveys. Question framing is one of the basic structures of the methodology and thus any manipulation in that could create different answers in a similar section.

A) With so many PSC rank holders protesting against the government, do you think the LDF did well in employment?

B) With many appointments in the last three months alone, do you believe the LDF government fared well in employment?

Adding one modifier to the questions changed the narrative, creating a loaded question and thus a predictable answer. This doesn’t mean that it often a deliberate process, since it could unintentional as well, but possible and quite damaging. In another real-life example, a leading news media asked people which they consider is the most hated party and the answer they received was BJP, leading to the channel apologising for their mistake on live television.

The problem at hand is exploring various issues in the most transparent methods possible, which is also the hallmark of a good study — the results should be recreated a second time under the same conditions. However, in a survey design, the conditions are time and perception. The same samples could reverse their views depending on the questions asked the second time as well as with changes in the political scenario in the state. So what makes the results worth the first time? Moreover, survey researchers often don’t explain the proper structure of the design as well as the question asked, and this lack of transparency creates more doubts than needed.

Now imagine another scenario, unlike US elections, where depending on the total vote share of a particular state, the total seats in that state will go to the person with the highest votes. Meaning that vote share directly affects seat share, save for exclusions. However, even the USA cannot apply the same principle in the overall level since Hilary Clinton lost to Donald Trump despite getting a 48% vote share. In India, the scenario is more complicated, with the 2014 general elections a prime example. With just a 31% vote share, the BJP captured 52% of the seats in the Lok Sabha in 2014. This is unreliable nature of numbers and data represented in seat shares is unreliable at most, and yet our news media would love to flaunt it as much as they did the last time.

The problem here is that perception of the companies change with nature with sociological conditions, political scenarios, and other such situations. Will the people who claim to vote for LDF continue to do so if the left joined hands with BJP? What if the people who participated in the study were lying, and were going to do the opposite? Keep aside one or two persons; a good number from the sample could offset the calculations.

A prime example would be the Bradley effect where white voters claimed that they would likely vote for Tom Bradley (an African American candidate) or not decided for the 1982 California Governor’s race. However, Tom Bradley lost to the white candidate. In 2016, it would repeat in the form of the ‘shy Trump’ effect where people were less inclined to admit to wanting to vote for Donald Trump due to his notorious public image. Trump still won. All of this can be attributed to social desirability bias, where people are more inclined to say or give answers in a manner favoured by others.

Kerala, regardless of its tall claims of progressive morals, cannot escape the biases it naturally inherits from human nature in the socio-political environment. Surveys capture the biases and create more problems for the results, meaning that it is interesting to know what a few people think about the elections. It should be synonymous with arguing with our friends over who wins the election and why it would be so. It should be hypothetical and shouldn’t be assumed as an alternative to facts.

The Designers

The third issue is the political and commercial aspirations of the designers. In a famous report from 2014, a sting operation conducted by News Express revealed alleged malpractices in 11 opinion polls agencies. In their ‘Operation Prime Minister’ revealed that the heads of the agencies were willing to “manipulate data and provide misleading results”.

News Express Editor-in-Chief Vinod Kapri said that their sting showed that these polling agencies are “willing to manipulate data to any extent at the behest of the client by way of deleting negative data or simply increasing the margin of error to show a spike in seats.

The important aspect here is that one of the alleged members of the 11 agencies recently collaborated with a Malayalam news channel to formulate the poll. It predicted that Pinarayi Vijayan would come back to power.

The question is, if the data is unreliable, if the analysis is not proper, if the experiment is unrepeatable, if the method isn’t transparent, if the agencies are corrupt, if the surveyees are biased, if the outcome is irrelevant, and if the questions are misleading, then why subject the voters to a futile exercise? Why organise one-hour debates over possibly arbitrary numbers? The problem isn’t that it could be all of the problems mentioned here. It could be one or a combination of them to make the final data seem decidedly irresponsible.

Our obsessions over percentages and predictions make it seem as if this whole process is one big exercise in statistical astrology. It would be equally effective to bring down one of the FIFA psychic animals to predict the outcome of each constituency. It would be an interesting turn of events, a dispatch from the routine, more newsworthy than pre-polls, and has equal value. Hey, Paul the octopus had a success rate of 85.7% while predicting the winners of the 2010 Fifa World Cup and Achilles the cat has a success rate of 75% when it comes to Fifa Confederations Cup. Can the pre poll surveys match those numbers while being equally cute?