A part of writing-after-reading series as a personal goal, I present you a book about a new oil, data. A book by Steve Lohr.
This book is written by a The New York Times journalist who always covers technology, business, and economics for more than twenty years. He won Pulitzer Prize for explanatory reporting, a distinguished award for writer who has the abilities to explain the subject that is important and complex, demonstrate mastery of the subject, have lucid writing and clear presentation.
He writes the book with eleven chapters inside which will be revealed sequentially in detail below:
- How Big is Big Data?
Modern data technology has gone mainstream which means it makes us as a individual consumer of certain technology product. The direct impact of this consumer Internet toward data collection is becoming economically a big deal.
In relation with decision making process, an exercise that happens in every part of both personal and professional, it requires measurement in order to manage our lives. Referring to what a statistician, W. Edwards Deming and a quality control expert, Peter Drucker said “you can’t manage what you can’t measure”. In order to measure, it is imperative that we have a data.
So with given significant progress in data collection, we as human get abundance of benefit in managing our lives.
- Potential, Potential, Potential
One of ways in managing our lives is orienting ourselves to science. Unfortunately we can not do science if we do not have data. In fact, based on science, we are able to predict future by seeing present clearly. Fore that reason, Jeffrey Hammerbacher once said “Data is the intermediate representation of science.”
This book revolves around a “quant” story of Jeffrey Hammerbacher for almost 60% of the content. He started a team who organized and mined social-network data in Facebook and called himself and his coworkers “data scientists”.
I found him as an omnivorous reader who had a mind of his own. He struggled in his life when he was in college and did not learn to cooperate and join with his class in preschool. Without intention to generalize any background is tied to certain profession, the most important factors in predicting a success is hard work. Jeff’s parental lecture which he received from his parent is “I do not care if you are a garbage collector as long as you are the best garbage collector you can be.”
- Bet the Company
Recently, the software or company’s unit that use data to make a better-informed decisions are given the label of “business intelligence”. In fact, the concept is presented in 1958 for the first time in a paper titled, “A Business Intelligence System” by Hans Peter Luhn, a computer scientist at IBM.
IBM is always become a company which invests heavily in research and development on market and technology trends. Nevertheless, there is always uncertainty, and this is the limit of big-data approach. Big data is good at interpolation — figuring out what happens next when the outcome is a continuation of the current trend. However it is far less good at extrapolation — figuring out what happens next when the trend line of the future is less clear.
- Sight and Insight
This chapter gives one success story of a company that harnesses big data. It is The McKesson drug distributor. They manage their working capital $1 billion less in inventory and gain roughly 13 percent of efficiency. If we go to business school, the university calls it management science.
Data is a new idol in business. However there is a special place for intuition as well. At its best, intuition is a synthesis of vast amounts of data. Decisions that seem intuitive are called “taste”, once Steve Jobs said. He was not a quant, but he was an awesome processor of non-numerical data, curious, self-taught, and tireless.
- The Rise of the Data Scientist
In 2008, one of technical action that Jeff did in Facebook as a data scientist was adopting Hadoop, a open-source variant of Google’s technology for splitting up and processing large data sets. He also started studying the data and conducted data experiment in the Web world eg. A/B testing, a simple randomized tests of what works best.
The data scientist at that moment in Facebook was a combination between data analysts, a business savvy person and research scientists, who usually has a PhD. Jeff made them become one terminology: data scientist, a worker whose job was an amalgam of skills, combining computer science, business, and social science.
One of Jeff’s class, he singled out John Tukey as the first data scientist. Tukey was a Princeton professor who contributed in World War II by improving the accuracy of artillery firing and bombing drops. He was also the one who coined the term “bit”, a contraction of binary digit.
- Data Storytelling: Correlation and Context
Telling a story from a data should be firstly understanding the context. For instance, a data point of 39 can be defined with:
- It’s a number that greater than 38 and less than 40.
- If it’s added with a piece of information, 39 degrees, it could be an angle or a temperature.
So, everything we add dramatically change our understanding, as Sam Adams said. He is a research scientist at IBM. And now lets discuss about correlation by using a case from Google:
- In 2009, Google’s service was spotting the spread f the H1N1 flu virus accurately ahead of official report. A big applause for Google.
- In 2013, Google Flu Trends reported that about 11 percent of Americans were ill. It’s nearly double of the 6 percent reported by Centers for Disease Control and Prevention. Apparently, news reports and social-media messages warning of a harsh flu season prompted a surge in flu-related searches.
This mistake was declared by Google in their article “The parable of Google Flu: Traps in Big Data Analysis”.
- Data Gets Physical
The book showed how data involves in physical stuff in our lives eg. big-data farming. It covers a story about precision agriculture pilot which is conducted by a joint effort of two companies, IBM and E. &J. Gallo Winery. The initiative showed results of 25 percent more grapes produced and higher quality of wine grapes. Thanks to data-guided system.
- The Yin and Yang of Behavior and Data
As we have understood that machine learning is an algorithm created by us, there is no computer system without human bias.
The case that is brought in is 2008 financial crises. The Wall Street quants did not cause the crisis, but they played their part by manipulating data to derive signals about human activity, just as the science behind financial economics. Apparently the risk models proved myopic because they are too simple-minded, which unable to take account of the rich, chaotic tapestry of behavior, especially in times of stress. It is aligned with what has been stated by Emanuel Derman, a physicist and former quant at Goldman Sachs and later wrote a book in 2011, Models Behaving Badly. He said “In physics, you are playing against God, and He does not change His laws very often. In finance you are playing with God’s creatures, agents who value assets based on their ephemeral opinion.”
So he advise us with we have to understand what models are best used for, and then be very careful not to discard our common sense.
- The Long Game
Back to Jeff who is now busy in Mount Sinai medical center making medicine “an information game”, it will make all-knowing doctor become no longer reign supreme. There will be a data scientist as well, monitoring and recommending treatments. Mount Sinai is only a few years old, but it is working on ambitious projects in treatment of cancer, diabetes, Alzheimer’s, and Crohn’s disease.
- The Prying Eyes of Big Data
This chapter talks about how data that we as social media consumers feed our data into each of them like Facebook and Twitter, and eventually they use those data to do profiling toward us.
One of the largest data brokers is Acxiom who has collected data on hundred of millions of consumers worldwide. They can place people bad households into categories like “potential inheritor”, “adult with senior parent”, and “diabetic focus.” Finally they use this profiling to target us with advertisements.
Another prominent company who is doing the same thing is IBM. They have a software program called KnowMe. It is able to do profiling by analyzing the language choices people make in their Twitter posts.
- The Future: Data Capitalism
Lastly, the book describes how significant role data posses as important as railways, telegraph, telephone, and accounting gave rise to large national corporations in the late nineteenth century such as Standard Oil, General Electric, United States Steel, and DuPont.
Even john Calkins, president of programming AMC Theatres who earlier worked for McKinsey and IBM, stated management of a company will become “less a finance exercise and more a data exercise”.
Financial capitalism will give way to data capitalism!