last modified: 2017-02-19
download this zip file and unzip it on your computer.
or use this direct link: https://tinyurl.com/gephi-tuto-3
You should find the file
miserables.gexf in it. Save it in a folder you will remember (or create a folder specially for this small project).
This file contains a network representing "who appears next to whom" in the 19th century novel Les Misérables by Victor Hugo.
A link between characters A and B means they appeared on the same page or paragraph in the novel.
The file name ends with ".gexf", which just means this is a text file where the network information is stored (name of the characters, their relations, etc.), following some conventions.
open Gephi. On the Welcome screen that appears, click on
Open Graph File
miserables.gexf on your computer and open it
A report window will open, giving you basic info on the network you opened:
This tells you that the network comprises 74 characters, connected by 248 links.
Links are undirected, meaning that if A is connected to B, then it is the same as B connected to A.
The report also tells us the graph is not dynamic: it means there is no evolution or chronology, it won’t "move in time".
OK to see the graph in Gephi.
We can switch to the data laboratory to see the underlying data:
We see that the nodes of the network have many attributes. In particular, each have a Gender and a measure of how central they are:
This is the list of edges (relations) in the network. Notice that they have a "weight" (a "strength").
In the overview, make sure the Filter panel is displayed:
How the Filter panel works:
An example: hiding edges with weight lower than 2
When you are finished using a filter in the zone, right click on it and select "remove".
One filter is applied AFTER this other:
The first filter to be applied is NESTED (placed inside) the second one as a "subfilter"
Which filter should be placed inside which? Let’s look at different examples:
Goal: Keeping on screen only the female characters which have a tie (an edge, a relation) of at least strength 2.
→ place the filter "edge weight" inside the filter "Gender":
In this case, it was equivalent to:
nest the "Gender" filter inside the "Edge weight" filter or
nest the "Edge weight" filter inside the "Gender" Filter
→ The result was the same (the network on screen is identical in both cases)
Here, we want to visualize:
only the nodes which have less than 10 relations <1>
and among these, only those which form the "main island" of the network (we want to hide small detached groups of nodes) <2>
in technical terms, nodes with a
degree of less than 10.
in technical terms, we are looking for the
We will see that the placement on the filters in the zone will make a difference.
First, let us place the filter on giant component inside the filter on degree:
In this first case,
only the giant component of the network was made visible.
→ Since the network was just one big connected "island" to start with, it did not change a thing.
then, all characters with more than 10 relations where hidden
→ this hides nodes which were connecting with many others, so that we end up with many groups, disconnected from each others.
Now instead, placing the filter degree inside the filter on giant component:
In this second case,
starting from the complete network, all characters with more than 10 relations where deleted.
→ this created a network made of many disconnected groups of nodes
then the giant component filter is applied,
→ which had for effect to hide small groups, to keep in view only the biggest group of connected nodes.
|In summary: be careful how you apply several filters at once, this might have an effect on the logic of filtering.|
Imagine you are interested in the female characters of the novel "Les Miserables".
you are interested in female characters and the relations among them
you are interested in the relations between female characters and male characters
you are not interested in the relations between male characters
How to display this?
The MASK operator applied on the gender partition filter enables you to:
show all characters
relations between female characters
and relations between male and female characters
but masking male-male relations
It is also possible to hide / show only some of the directed relations between the visible graph and the filtered out graph:
Imagine you are interested in the characters with names starting with "L" or "J" in "Les Miserables".
How to display only these characters?
We will need to apply filters on the
Label of the nodes, which contains the names of the characters.
However, looking at the "catalogue" of filters, we see no filter on
Label. The reason is that
Label is an internal property of nodes, inaccessible to filters.
So we must first copy the Labels of the nodes in a new attribute, which we will be able to apply a filter on.
Let’s switch to the data laboratory and add this attribute:
We now have an attribute called "Name" that we can find in the Filters:
This is how the filter on Name and its parameters look like in the zone:
To recall, we want to show only the characters which name start with "L" or "J". Let’s start with the "L" characters.
We need to find the names which match the pattern
Start with an L. The way to describe a pattern in text is called a "regular expression".
Said differently, a regular expressions (also called "regex") is a convenient way to express a pattern we search for in a text.
Regular expressions can become very sophisticated. But here, we need just a simple one:
Let’s examine what the L, the dot and the star mean.
the letter "L" means we want names starting with this first letter
. the dot means: any character
* the star means: the previous character, repeated any time.
So: "select nodes which have a name starting with L, followed by any character, in any number"
Please note that you need to check the box "regex":
When the filter is applied, only the characters wit a name starting with L will be displayed:
How to filter characters with a name starting with the letter "L" or "J"?
We could rely on a more complex regular expression to do this:
Meaning: "select nodes which have a name starting with L or J, followed by any characters"
But we can also rely on 2 filters: one for L, one for J. Nesting one inside another would not work, it would mean:
"show nodes which start with an L, and among them, only those which start with a J"
→ no node can meet this condition, so they would all be invisible.
Instead, we should use the
UNION operator that can be found here:
Drag it to the zone, and then drag inside it twice the
Attributes → Equal → Name filter:
In the settings of the first Name filter, put the regular expression:
In the second Name filter, put:
(make sure the "regex" box is checked in both cases)
As a result, the nodes selected by both filters are added up in the display:
The NOT operator flips the result of a filter: what was hidden becomes visible and vice and versa.
Example: if we want to display all characters except for those returned by a UNION on 2 Name filters on L and J initials:
Same effect, but applying the NOT operator on single filter using a regex on L or J:
Same effect again, achieved without using the NOT operator. In regular expressions the ^ sign inside square brackets means "NOT":
Tutorials about regular expressions:
And a web page where you can test your regular expressions: http://regexpal.com
Visit the Gephi group on Facebook to get help,
or visit the website for more tutorials