SSC - Source/Sink Centrality
Definition
SSC-Source/Sink Centrality is a directed graph framework, to address the limitations of standard models. $SSC$ separately measures the importance of a node in the upstream and the downstream of a pathway, as a sender and a receiver of biological signals, and combines the two terms for evaluating the centrality.
This method apples as an extension for standard centrality models .PageRank Centrality is a spectral centrality measure where the importance of a node is a function of the centrality of its neighbors. In its original definition, PageRank describes the probability distribution of a uniform random walk with restart being present at each node of a graph after a large number of steps. In graph theory terms, the PageRank of a node $v$ is based on the PageRank of the nodes with links to $v$, divided by their out degrees. Formally:
$$C_{pgr}(v_i)=\beta_i + {\underset{v_j|v_i \in N_G(v_i)}{\sum}} {C_{pgr}(v_j)\over |N_G(v_j)|}$$
$\beta^i$’s are constant values that relate the probability of restarting at node $v_i$. The parameter $\alpha$ is a dampening factor that relates to the transition probability of the random walk. The previous Formula can be expressed in a vectorized format as following:
$$C_{pgr}=\beta + \alpha A^T D^{-1} C_{pgr}$$
where $C_{pgr}$ is the vector of centralities and $\beta$ is the vector of initial values. $D$ is the diagonal (out) degree matrix such that $[D]_{ii} = max (deg_{(+)}(v_i), 1)$. A closed form solution of the first Formula is achieved by solving for $C_{pgr}$. Formally:
$$C_{pgr}= (I − \alpha A^T D^{−1})^ {−1} \beta$$
In the Sink component of the PageRank, the downstream nodes have the higher importance. This is because a random walk will not be present at any node without incoming edges, unless by a restart event. The PageRank Sink centrality captures the importance of a node as a receiver of information. Formally the Sink PageRank centrality $(C_{pgr}^{Si})$ defined as:
$$(C_{pgr}^{Si})(v):= C_{pgr}(v)$$
To modify PageRank in such a way that captures the importance of nodes as source of signal, we derive a PageRank score when applied to the transpose of a graph. Formally, we define the PageRank Source $C_{pgr}^{So}$ as:
$$C_{pgr}^{So} (v_i) = \beta_i + \alpha {\underset{v_j|v_i \in N_G(v_i)}{\sum}} {C_{pgr}^{So} (v_j)\over |N_{GT} (v_j)|}$$
$\beta_i$ and $\alpha$ are constants that relate to the restart and transition probabilities. The PageRank Source of a node is calculated based on the centrality of a its neighbors in the transposed graphs. Define the diagonal in-degree matrix, $D^\prime$, of $G$ such that $[D^\prime]_{ii} = max(1, deg^- (vi))$. Similar to the equations for deriving the standard PageRank, the Source component can be solved as following:
$$C_{pgr}^{So}= (I − \alpha AD^{\prime−1})^{-1} \beta$$
Directed centralities only gives importance to either upstream nodes or downstream ones. To address this issue the Source/Sink PageRank defined . The fundamental concept of Source/Sink modeling is to measure the centrality of nodes as both sources and sinks of information. The Source/Sink concept to the PageRank adapted by calculating Source and Sink Centrality values individually and summing them:
$$C_{pgr}^{SS}(v) = C_{pgr}^{So}(v) + C_{pgr}^{Si}(v)$$
The above definition has no limitation of using different constant parameters for $C_{pgr}^{So}$ and $C_{pgr}^{Si}$.
This method considers a better topological description of the pathways that accounts for the importance of the pathway elements with respect to the upstream and downstream positions.
This method apples as an extension for standard centrality models .PageRank Centrality is a spectral centrality measure where the importance of a node is a function of the centrality of its neighbors. In its original definition, PageRank describes the probability distribution of a uniform random walk with restart being present at each node of a graph after a large number of steps. In graph theory terms, the PageRank of a node $v$ is based on the PageRank of the nodes with links to $v$, divided by their out degrees. Formally:
$$C_{pgr}(v_i)=\beta_i + {\underset{v_j|v_i \in N_G(v_i)}{\sum}} {C_{pgr}(v_j)\over |N_G(v_j)|}$$
$\beta^i$’s are constant values that relate the probability of restarting at node $v_i$. The parameter $\alpha$ is a dampening factor that relates to the transition probability of the random walk. The previous Formula can be expressed in a vectorized format as following:
$$C_{pgr}=\beta + \alpha A^T D^{-1} C_{pgr}$$
where $C_{pgr}$ is the vector of centralities and $\beta$ is the vector of initial values. $D$ is the diagonal (out) degree matrix such that $[D]_{ii} = max (deg_{(+)}(v_i), 1)$. A closed form solution of the first Formula is achieved by solving for $C_{pgr}$. Formally:
$$C_{pgr}= (I − \alpha A^T D^{−1})^ {−1} \beta$$
In the Sink component of the PageRank, the downstream nodes have the higher importance. This is because a random walk will not be present at any node without incoming edges, unless by a restart event. The PageRank Sink centrality captures the importance of a node as a receiver of information. Formally the Sink PageRank centrality $(C_{pgr}^{Si})$ defined as:
$$(C_{pgr}^{Si})(v):= C_{pgr}(v)$$
To modify PageRank in such a way that captures the importance of nodes as source of signal, we derive a PageRank score when applied to the transpose of a graph. Formally, we define the PageRank Source $C_{pgr}^{So}$ as:
$$C_{pgr}^{So} (v_i) = \beta_i + \alpha {\underset{v_j|v_i \in N_G(v_i)}{\sum}} {C_{pgr}^{So} (v_j)\over |N_{GT} (v_j)|}$$
$\beta_i$ and $\alpha$ are constants that relate to the restart and transition probabilities. The PageRank Source of a node is calculated based on the centrality of a its neighbors in the transposed graphs. Define the diagonal in-degree matrix, $D^\prime$, of $G$ such that $[D^\prime]_{ii} = max(1, deg^- (vi))$. Similar to the equations for deriving the standard PageRank, the Source component can be solved as following:
$$C_{pgr}^{So}= (I − \alpha AD^{\prime−1})^{-1} \beta$$
Directed centralities only gives importance to either upstream nodes or downstream ones. To address this issue the Source/Sink PageRank defined . The fundamental concept of Source/Sink modeling is to measure the centrality of nodes as both sources and sinks of information. The Source/Sink concept to the PageRank adapted by calculating Source and Sink Centrality values individually and summing them:
$$C_{pgr}^{SS}(v) = C_{pgr}^{So}(v) + C_{pgr}^{Si}(v)$$
The above definition has no limitation of using different constant parameters for $C_{pgr}^{So}$ and $C_{pgr}^{Si}$.
This method considers a better topological description of the pathways that accounts for the importance of the pathway elements with respect to the upstream and downstream positions.
References
- Naderi Yeganeh P., Naderi Yeganeh P., Richardson C., Saule E., Loraine A., Taghi Mostafavi M., 2020. Revisiting the use of graph centrality models in biological pathway analysis. BioData Mining, 13(1). DOI: 10.1186/s13040-020-00214-x