2. The Data Villains

Updated: Jun 16, 2020

This article is Talk 02 of Honest Data Science Talks series. Each talk of around 500 words length, is personal thoughts and deliberations on the domain.

While we are talking about data science, but let us take a step sideways and peek-a-boo with data. By far, our efforts have been to generate and analyse science for the data we have. We started mining science from the data and we have also begun mining science from the science. But what if the data itself is questionable?

“Hey, like and comment for my kid for the cute baby contest”,

“Like and comment on my new poem”,

“Like my photograph, one with highest like wins”,

etc, tce, cet.

How authentic are all these data?

The pizza delivery guys say: “please rate me 10”. “Please rate us 5 on google,” says another shop keeper.  Because my kid, 4-year-old, likes red color, he wants to press that button after shopping to indicate our experience (where the shop has red to green buttons and red indicates the worst shopping experience).

How real are all these data?

Friends and family write a book review. A friend takes another friend’s test. Most online profiles do not present the real behaviour of a person and also people behave differently on the web. The emotions are played with and the timeline of events is really messed up. Movie reviews are paid and customised. Advertisements superfluously glorify the products they are branding. Celebrities endorse for branding and money.  No one knows what happens to expired, unused, spoiled etc products and items.

My wife always gives my phone number for any transaction that she does outside. I keep receiving women customised recommendations at times. My sister does shopping for all of us from her account. She gets an assortment of recommendations. My friend uses my machine and search engine, and I get the recommendations and related ads. Everyone around is collecting the data and assigning them to identity. But, you know my but’s.

History is altered, the present is flawed and the future is at stake.

There is so much of fake data on the web. People easily copy and paste contents as if they own. Truth is unknown. Fakes go viral. Word ‘sad’ is considered to be an unhappy sentiment. Wishes are hardly heartfelt or real and are mostly forwards. The language used by many is horrible and has no true significance. Everyone wants feedback and everyone wants to take a survey, but everyone also wants to look good. Better in real or not, everyone wants to show best on the internet presence. The words are contextual and so are the conflicts. Most information is partial. While some are partial, many are not updated and are outdated. No one bothers to care about societal context and everyone needs data.

Everyone needs data because there are ready API to plot graphs and everyone wants to present graphs being least bothered about how relative it is to the societal context and truth.

It’s pathetic that data has led to shortcuts and these are not moral and upright. Forget the experimental data, for it is too limited to make any kind of analysis as compared to the real amount of data we have out there.

With all this terrifying data on the get and going, how accurate is our science on it?

