(This is a random rambling capture of a panel discussion that happened as part of the NASSCOM EmergeOut Event in Pune Today.)
Panel discussion on Big Data and Analytics. The panel was moderated by Gaurav Mehra, co-founder and MD of Saba Software. The panelists were Moti Thadani, MD of SAS Software, India, Mukund Deshpande, head of the Business Intelligence and Analytics Practice at Persistent, and Prashant Pansare, CEO of Intelliment.
Gaurav set the stage with the problem of Big Data:
We are facing a tidal wave of data. We are generating more data per day now, than in our entire history before this.
We cannot deal with this data using traditional data management software, or even traditional hardware architectures. New approaches are needed.
And this data makes new business strategies possible.
For example, consider the GoldCorp story:
GoldCorp: was a company going out of business since their existing mines had stopped producing. The CEO took a strategic decision to make their prospecting data publicly available, and announced a US $500,000 reward for usable suggestions of where they should mine further for gold. This program was a huge success. Thousands of prospectors analyzed the data, 110 new sites were identified (half of them previously unknown to the company), and 80% of them yielded significant gold reserves, and this has resulted in over $6 billion worth of gold. By going outside the company walls, they converted the company from a struggling enterprise into one of the most profitable in the industry. For more on the GoldCorp challenge, see here.
And even governments all over the world are making data publicly available.
How do we make use of this data?
Prashant made some interesting points about the use of Big Data
- Don’t just use more data. Use data from more sources. That makes your data richer and your analysis more insightful
Pictures are worth a thousand words. Put effort into visualization of the data.
Mukund described a case study – Persistent’s analysis of data gathered by Aamir Khan’s TV show Satyamev Jayate. Satyamev Jayate was not interested in a revolution or an uprising – but rather in bringing about change in individuals. And they wanted to use analytics to figure out what impact they were having. They collected data via phone lines, twitter, facebook, sms, and the website. Here is the analysis that was done about it:
- For each show, they had to take topic of the show, and then identify a taxonomic tree of various sub-topics related to the topic. Thus, if the topic was female foeticide, then the sub-topics were doctors, mothers, marriage, etc. And all the messages on various channels were analyzed to figure out which sub-topic were people talking about
- The group at Satyamev Jayate was very interested in the analytics. What are people in Haryana talking about? What are young men talking about. Are sentiments favorable to a sub-topic or unfavorable?
- Problems faced: All the data was unstructured, so capturing and tagging it was a challenge. The data was in multiple languages. The topic of a show was not known to them until the show aired, and then they would have to figure out the taxonomy while the topic was already trending on twitter.
If you’re an engineer interested in getting into the area of data analytics, you need the following skills:
- Exposure to the mathematical theories behind big data, especially probability and statistics
- Exposure to the software algorithms that are used in big data. Thankfully most of these are open source, and there are existing tutorials available on the web.
- You should try solving some of the challenges listed on kaggle.com, which gives cash prizes for solving data analytics challenges.