Doing the right thing first
Artificial Intelligence (AI) feeds on data. Particularly, neural networks and deep learning, such as TensorFlow, have a voracious data appetite. Yet, despite its importance, data often remains an afterthought. Typically, planning for a new data analytics project is occupied with debates about the right skill set of data scientists, the right tools, deadlines and, of course, budget. As a result, most of the time of a data analytics project (measurements range from 50% to < 80%) is consumed with data search, collection, and refinement. A key solution to saving time and money is to specify data needs upfront and create data pools accordingly.
Creating data pools
On their own very few companies will be able to collect the massive amounts of data that helped data analytics pioneers like Amazon, Facebook and Google create success stories. One trick to level the playing field is teaming up with others to pool data. Data can be pooled: (a) vertically along the successive stages of a supply chain (for example, to predict a shipment’s estimated time of arrival) (b) horizontally, for one machine make and model across all users (for example, to predict outages and improve uptime) (c) by stacking it “on top of each other” to create “data sandwiches.” One example is layering street maps with data on vehicle traffic, people traffic, weather conditions and event information to predict traffic flows.
Text by: Prof. Dr. Chris Schlueter Langdon, Deutsche Telekom
More blog articles from Professor Christoph Schlueter Langdon: Click here