A call to grow an ecosystem of apps for text and network analysis on new data sources

With the end of the free Twitter API, network and text analysts are looking for new data sources. This is an opportunity to stimulate the ecosystem of apps that specialize in data import, text mining and transformation to networks.

End of the Twitter API: a problem, and an opportunity

A large crowd of diverse actors relied on the Twitter API to collect data for text analysis, network analysis or both. Just taking academics, you find a rapidly growing number of publications mentioning the “Twitter API” in recent years:

year	count of publications mentioning the Twitter API
2017	2,200
2018	2,480
2019	2,620
2020	3,100
2021	3,890

(source: Google Scholar)

With the new plans for API access set at prohibitive prices, this stream of publications will run dry.

What’s next?

Passed a moment of shock and disbelief, academics and others (OSINT, journalists, marketers, developers, etc.) start turning to other data sources.

meme from the office

Here are the data sources that I personally consider, or that I heard others considering:

Open Street Map (API)
Wikipedia / Wikidata (API)
Spotify (API)
Reddit (well… not anymore)
Mastodon (API)
OpenAlex (this one is quite specialized but really worth a look: gigantic, free access to global scientometric data).

The obvious reaction is: none of these are an equivalent to Twitter. It will take a mourning period but yes, we’ll have to move on (I won’t engage here on Threads, just: I don’t wait or count on an API access to it). Human curiosity must find other objects to apply to.

Prediction 🔮: we are going to see, in the next months and over 2024, an intense activity of developing importers and user interfaces for these alternative data sources mentioned above.

New data importers, new analytics, new user interfaces: a call to talk and exchange

So we are at the start of an intense period of import tool building and new services that are going to be created on top of them. This is a great occasion to address the makers of these solutions-to-come, and my question to them is:

Why would we need to do this in isolation? I think that there are two big reasons not to engage in collaborations, and one of them is a false one.

One reason not to engage in collaborations is to keep one’s product secret and non reproducible, to build a competitive advantage. A data importer with some clever data transformation added to it can deliver insights of great value, so let’s not share it.

A second reason could be technical and resources constraints. Working alone can let you go faster, as collaborating adds coordination and interfacing costs and delays.

This second reason is less strong, I believe. Coordination and interfacing costs are real, but setting up a collaboration also helps distribute work among many, instead of doing everything by one’s self.

Let’s consider the steps for a service consisting in, say, creating a word cloud from the most common expressions used in a set of Wikipedia pages. Constructing the word cloud is far from the only operation we need to build.

Are you ready to design and implement each of these 9 operations, and especially the 9th one?

Creating an interface (web, mobile app or desktop?) where the user will express their query and choose their parameters
Hosting this interface (server costs, adapting to multiple OS…)
Choosing and familiarizing with an API client for Wikipedia, writing the code to fetch any user query, handling errors
Storing the data - issues of scale and OS specificity
Parsing and cleaning the data (are you supporting different languages?)
Constructing the word cloud
Formatting the results: a picture of the cloud, a table view of the underlying words and their counts, a gexf network format for the underlying network of relations?
An interface to export the results: showing the picture on screen, downloading the picture in different formats (svg, png?), exporting to Excel and Google Sheets, visualize the word cloud in an interactive way with Gephi Lite, VOSviewer online or a custom D3 view, export a gexf or GraphML to be opened by Gephi or NodeXL?
Maintaining steps 1 to 8 through time 😅😱.

I believe that relatively big for-profit organizations have a motive and the resources to handle these 9 steps all by themselves (even then, they rely and contribute to open source solutions for parts of the process!)

For smaller organizations (for profit or not) and individuals, there is a strong case to join forces when and if possible, to help each other on one or several steps of the process above.

I go first: what I am happy to help with 🤗

The list below are functional blocks that I can relatively easily provide to you, in a format that suits your needs.

Tools

methods to extract words and n-grams from a given text
methods to convert co-occurring words into a network of words
methods to infer the sentiment from a text
methods to detect key topics in text

Interfaces

you are free to add your method to https://nocodefunctions.com, which sorts out interfacing and hosting issues
we can work at interfacing your app and nocodefunctions.com, which means your app integrates some of the functions of nocodefunctions.

(this one seems like: “ah ah I see your ulterior motive for all this, you want to grow your web app!” I would be happy to, that is true. I also know that many developers don’t have the time to maintain their own web app, so I offer this for them).

Import / export

from a csv, txt, pdf or Excel file to the plain text they contain
from a gexf or graphML file to a url to visualize this file on the web

“How to start collaborating? I code in R / Python / JS and I don’t know how to do X, Y or Z”

The first immense step is psychological

I know from experience that it is very hard to share one’s code or application. For the first 10 years of my coding career, I did not really open sourced my apps completely, because of an inner fear that I would be “copied” or robbed of the value that I had created. I thought that maybe, someday, I could monetize what I produced, so it would be silly to publish the result of my hard work and get anyone to use it before me. It took ten years but I am past that. I still believe that there is a tiny chance that one day I’ll be able to earn something from what I create, but in the meantime this should not keep what I do hidden and unproductive.

The second step is technical: how to integrate or interface codes from 2 different persons?

This is not hard actually, we’ll figure out a way to coordinate. The general idea would be to keep the two projects separate, and to build interfaces to join the two. These interfaces can be of different sorts:

creating a web API which takes project’s A output and format it so that project B can use it
creating a command line interface for the same result
harmonizing data formats to facilitate exchanges

I am happy to work on these interfaces with you, let’s connect!

a hand shake

Let’s get in touch, either privately (analysis@exploreyourdata.com) or let’s discuss publicly on Twitter or Mastodon.

About me

I am a professor at emlyon business school where I conduct research in Natural Language Processing and network analysis applied to social sciences and the humanities. I teach about the impact of digital technologies on business and society. I build nocode functions 🔎, a click and point web app to explore texts and networks. It is fully open source. Try it and give some feedback, I would appreciate it!

my email: analysis@exploreyourdata.com 📧
or on Twitter: @seinecle 📱
you can also read the other articles of this blog 👓, where I write about the process of developing the app.

Date: July 23, 2023