last modified: 2017-10-08
Not a closed list, not a recipe!
Rather, these are essential building blocks for a strategy of value creation based on data.
1. Predictive churn / default / … (banks / telco)
2. Predicting crime
3. Predicting deals
4. Predictive maintenance
1. Collecting data (cold start problem)
2. Risk missing the long tail, algorithmic discrimination, stereotyping
3. Neglect of novelty
1. Amazon’s product recommendation system
2. Google’s “Related searches…”
3. Retailer’s personalized recommendations
2. Finding the value proposition which goes beyond the simple “you purchased this, you’ll like that”
1. Clarivate Analytics curating metadata from scientific publishing
2. Nielsen and IRI curating and selling retail data
3. ImDB curating and selling movie data
1. Slow progress: curation needs human labor to insure high accuracy, it does not scale the way a computerized process would.
2. Must maintain continuity: missing a single year or month hurts the value of the overall dataset disproportionally.
3. Scaling up / right incentives for the workforce: the workforce doing the curation should be paid fairly, which is not the case yet.
4. Quality control
1. Selling methods and tools to enrich datasets
2. Selling aggregated indicators
3. Selling credit scores
1. Knowing which cocktail of data is valued by the market
2. Limit replicability
3. Establish legitimacy
1. Search engines ranking results
2. Yelp, Tripadvisor, etc… which rank places
3. Any system that needs to filter out best quality entities among a crowd of candidates
1. Finding emergent, implicit attributes (imagine: if you rank things based on just one public feature: not interesting nor valuable)
2. Insuring consistency of the ranking (many rankings are less straightforward than they appear)
3. Avoid gaming of the system by the users (for instance, companies try to play Google’s ranking of search results at their advantage)
1. Tools for discovery / exploratory analysis by segmentation
2. Diagnostic tools (spam or not? buy, hold or sell? healthy or not?)
1. Evaluating the quality of the comparison
2. Dealing with boundary cases
3. Choosing between a pre-determined number of segments (like in the k-means) or letting the number of segments emerge
(click on the logos to get to the relevant web page)
1. Should not create a failed product / false expectations
2. Both classic (think of ) and frontier science: not sure where it’s going
Find references for this lesson, and other lessons, here.
This course is made by Clement Levallois.
Discover my other courses in data / tech for business: http://www.clementlevallois.net
Or get in touch via Twitter: @seinecle