last modified: 2023-05-15
Not a closed list, not a recipe! Rather, these are essential building blocks for a strategy of value creation based on data.
Risk missing the long tail, algorithmic discrimination, stereotyping
Neglect of novelty
Amazon’s product recommendation system
Google’s “Related searches…”
Retailer’s personalized recommendations
Clarivate Analytics curating metadata(data, data curation) from scientific publishing
Nielsen and IRI curating and selling retail data
ImDB curating and selling movie data
NomadList providing practical info on global cities for nomad workers
Slow progress: curation needs human labor to insure high accuracy, it does not scale the way a computerized process would.
Must maintain continuity: missing a single year or month hurts the value of the overall dataset.
Scaling up / right incentives for the workforce: the workforce doing the digital labor of curation should be paid fairly, which is not the case yet.
Selling methods and tools to enrich datasets
Selling aggregated indicators
Selling credit scores
Knowing which cocktail of data is valued by the market
Search engines ranking results
Yelp, Tripadvisor, etc… which rank places
Any system that needs to filter out best quality entities among a crowd of candidates
Finding emergent, implicit attributes (imagine: if you rank things based on just one public feature: not interesting nor valuable)
Insuring consistency of the ranking (many rankings are less straightforward than they appear)
Avoid gaming of the system by the users (for instance, companies try to play Google’s ranking of search results at their advantage)
Tools for discovery / exploratory analysis by segmentation
Diagnostic tools (spam or not? buy, hold or sell? healthy or not?)
Evaluating the quality of the comparison
Dealing with boundary cases
Choosing between a pre-determined number of segments (like in the k-means) or letting the number of segments emerge
OpenAI with ChatGPT
Legal framework still being shaped (what about the copyright of the content ChatxGPT has been trained on?)
Still new, no "textbook" on how to use it at its fullest potential