This article is Talk 03 of Honest Data Science Talks series. Each talk of around 500 words length, is personal thoughts and deliberations on the domain.

No wonder not only us, but even the society (social) around us is encouraging complexity. Technology always brings updates. These updates, to me, seem like a universal balance. It updates something somewhere and down-dates (hope I can coin and call that) something somewhere else. The decision making and process workflow is not the same as what it was once and we cannot ignore it, just because we don’t see it.

Here is a scenario to make it more transparent. I have a grocery shop in my neighbourhood where I mostly buy my daily needs from. If the shop has less or no customers, I prefer to converse to understand the process. One of the evenings when I had been to buy a bread packet, he said that was the first pack that got sold that day. I was a little curious and wanted to know more. When I tried to understand what he meant, I had this ‘okay-that’s-new’ realisation.

It’s the same old question – who will bell the cat?

Once, not very long ago, the bread delivery van used to arrive once in 3-4 days and deliver all the required supplies. Bread usually has an expiry time of a week. It was the shopkeeper’s decision on how many to purchase. Often, and many times either there used to be a shortage of supplies or a few packs expired. Sometimes the expenses were shared. But now the model has changed. The van comes every day. Every day the packages can be purchased based on the need. But, it’s still dilemma management. Some days, the packs are over by 11.00 am and some days it takes 6.00 pm for the first pack to go out. Inventory management needs to have a new chapter (or maybe there already is).

I tried to visualise the whole scenario in a data scientist perspective. I also asked if bread packs sold out more and fast on Sunday’s or any other specific days. I learned that there was no pattern at all — not even an approximate pattern. Collecting a month data or year data or decade data was not going to help. The model of management had changed and I don’t know how many times. We cannot collect the data generated using different process models and apply the same data science model for all of the data. We cannot plot a graph for this and predict the next day requirement. We cannot predict the accuracy of such a system. The system should also take care of the data generation model and not just the data.

Each and every home in the locality needs to be understood and first, their bread pack model needs to be predicted. Then, based on that data, the shopkeeper data can be predicted. Maybe there is more that I have missed out too.

I sit amazed and questioned about the societal effects on data science. And to say, this is one such example.  The social affects the data and it cannot be ignored. All those existing models look a lot immature, at least to me. Someone needs to pull the socks up and really get on to understanding such social-science models and then logically realise a meaningful, holistic picture for data science.

Dear Data Science,

Please understand Social Science,if you may want to have science.