node2vec: Scalable Feature Learning for Networks

Published:

Today’s post describes node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks.

Many network analysis tasks involve predictions over nodes and edges. For instance, an analyst might be interested in classifying the sentiment of a user in a social network, or predicting links that could be created in the future. A typical approach to enable this analysis is feature engineering. Using various characteristics of the network, from centrality metrics to edge weights, we create domain specific features for each node. However, it is not certain that the outcome of this tedious task will generalise well. Another approach is to learn feature representations by solving an optimisation problem. The objective in that case is to preserve the local neighborhoods of nodes when learning their representation, however, the quality of the produced features heavily depends on how we defined a neighborhood in the first place.

Node2vec solves this problem by providing a flexible notion of a node’s network neighborhood.

By choosing an appropriate notion of neighborhood, node2vec can learn representations that organise nodes based on their network roles and/or communities they belong to.

In detail, node2vec is a semi-supervised method that finds the vector representation of nodes in a way that node neighborhoods are preserved in the d-dimensional feature space. Node2vec works with any graph, undirected or directed, weighted or not. The algorithm can be parallelised easily because of its modularity and can be trained on large networks with millions of nodes in a few hours. As a result, node2vec outperforms other dimensionality reduction techniques both in terms of scalability and representation accuracy.

The NLP literature and specifically word vectors produced with the skip-gram model inspired node2vec. Skip-gram model is trained on text corpora where every document is an ordered sequence of words. On the inference step, we use a specific word to predict those found in the sliding window around it. In a similar fashion, we can select a specific node and predict those that are connected with it. However, networks are not linear, so how would we select which nodes belong in the sliding window? The main contribution of the paper is that node2vec is using a tunable sampling strategy that given a source node, it enables the user to explore the search space in a flexible way.

Search strategies

Node2vec introduces a biased random walk procedure that can explore node neighborhoods in Breadth-first Sampling (BFS) as well as Depth-first Sampling (DFS) fashion.

Formally, given a source node u, we simulate a random walk of fixed length $l$. Let $c_i$ denote the ith node in the walk, starting with $c_0 = v$. Nodes $c_i$ are generated by the following distribution: $% $

where $π_vx$ is the unnormalized transition probability between nodes $v$ and $x$, $Z$ is the normalizing constant and $E$ describes the network’s edges.

The simplest way to estimate the transition probability would be to sample the next node based on the edge weights, which is equivalent to setting $π_{vx}=w_{vx}$. However, this search process would be quite restrictive and thus, the authors decided to guide the walk using a second order random walk with two parameters, p and q, that control how fast the walk explores the graph and leaves the neighborhood of the starting node.

Consider a random walk that just traversed edge $(t, v)$ and now resides at node $v$ (figure below). The walk now needs to decide on the next step so it evaluates the transition probabilities $π_{vx}$ on edges $(v, x)$ leading from $v$. We set the unnormalized transition probability to $π_{vx} =α_{pq}(t,x)\times w_{vx}$, where

$% $

and $d_{tx}$ denotes the shortest path distance between nodes $t$ and $x$. Note that $d_{tx}$ must be one of {0, 1, 2}, and hence, the two parameters are necessary and sufficient to guide the walk.

What are parameters p and q?

Return parameter p

It controls the likelihood of immediately revisiting a node in the walk. High values ensure minimise the likelihood of sampling already revisited nodes in the following two steps, while low values would lead the walk to backtrack a step. Therefore, the first strategy encourages exploration and the latter keeps favors local interactions, close to the starting node.

In-out parameter q

It allows the search to differentiate between “inward” and “outward” nodes. If q > 1, the walk obtains a local view of the graph around the starting node t and its behavior resembles BFS because the samples contain only nodes within a small area. If q < 1, the walk visits nodes that are farther away from the starting node t. This approximates DFS’s behavior which encourages outward exploration.

Applications of node2vec

To showcase the effectiveness of node2vec in finding similar nodes based on their topology, the authors used the Les Miserables network. They clustered their vector representation and drew the network again and assigned colour to nodes based on their cluster.

Moreover, the authors tested the algorithm on multi-label classification and link prediction tasks where node2vec outperformed other techniques.

To summarise, node2vec seems like a quite useful approach to use. It offers more flexibility than DeepWalk but Poincare embeddings could be another interesting approach that could be tested when searching for ways to represent nodes as vectors.

Paper:

Grover, A. and Leskovec, J., 2016, August. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855-864). ACM.