SALSA - Stochastic Approach for Link-Structure Analysis
Definition
The approach is based upon the theory of Markov chains, and relies on the stochastic properties of random walks performed
on our collection of pages.
The SALSA is combination of HITS and PageRank which creates a neighborhood graph using authority and hub pages and links and create a bipartite graph of the authority and hub pages in the neighborhood graph.
On this bipartite graph we will perform two distinct random walks. Each walk will only visit nodes from one of the two sides of the graph, by traversing paths consisting of two G-edges in each step. Since each edge crosses sides of G, each walk is confined to just one of the graph’s sides, and the two walks will naturally start off from different sides of G.
Two stochastic matrices, which are the transition matrices of the two Markov chains formed from bipartite graph G. A hub Markov chain with matrix H and an authority Markov chain with matrix A.
The matrices A and H serve as the association matrices required by the metaalgorithm for identifying the authorities and hubs.
We shall assume that G is connected, causing both stochastic matrices A and H to be irreducible. This assumption does not form a limiting factor, since when G is not connected we may use this technique on each connected component separately.
The principal eigenvector of an irreducible, aperiodic stochastic matrix is actually the stationary distribution of the underlying Markov chain, and its high entries correspond to pages most frequently visited by the (infinite) random walk.
The SALSA is combination of HITS and PageRank which creates a neighborhood graph using authority and hub pages and links and create a bipartite graph of the authority and hub pages in the neighborhood graph.
On this bipartite graph we will perform two distinct random walks. Each walk will only visit nodes from one of the two sides of the graph, by traversing paths consisting of two G-edges in each step. Since each edge crosses sides of G, each walk is confined to just one of the graph’s sides, and the two walks will naturally start off from different sides of G.
Two stochastic matrices, which are the transition matrices of the two Markov chains formed from bipartite graph G. A hub Markov chain with matrix H and an authority Markov chain with matrix A.
The matrices A and H serve as the association matrices required by the metaalgorithm for identifying the authorities and hubs.
We shall assume that G is connected, causing both stochastic matrices A and H to be irreducible. This assumption does not form a limiting factor, since when G is not connected we may use this technique on each connected component separately.
The principal eigenvector of an irreducible, aperiodic stochastic matrix is actually the stationary distribution of the underlying Markov chain, and its high entries correspond to pages most frequently visited by the (infinite) random walk.
Requirements
Require strongly connected network.
Software
References
- LEMPEL, R. & MORAN, S. 2001. SALSA: the stochastic approach for link-structure analysis. ACM Trans. Inf. Syst., 19, 131-160. DOI: 10.1145/382979.383041