iterativeWGCNA icon indicating copy to clipboard operation
iterativeWGCNA copied to clipboard

Identification of hub genes within each module

Open brettChapman opened this issue 3 years ago • 2 comments

Hi

I'm interested in producing dendrograms, heatmaps, network plots (force directed network plots) and hierarchical edge bundle plots using the final merged members of each module.

From what I've read, to produce any plots from iterativeWGCNA I would use the members to filter the genes in the original expression matrix and then run this subset through the standalone WGCNA in R.

How are the hub genes determined? Would these be the top 10% genes by KME values within each module? Could I use these top 10% to run correlations between each module and within each module. My expression data is also across 5 different tissues, and I would like to visualise differences across tissues in the plots.

Previously I've been subsetting the expression matrix by tissue type, heavily filtering on differentially expressed genes by qvalue (determined by Ballgown), and then running Pearson correlations across the top 75% expressed genes, so that I remove low expressed genes from the correlation analysis. This allows me to find correlations between highly expressed genes within and between different tissues. I'm trying to look at ways to incorporate interativeWGCNA and the standalone WGCNA packages into my analysis so that I can further enrich for particular genes with high connectivity, without having to be heavy on the qvalue filtering, which would likely remove genes of biological importance.

Thank you for any help you can provide.

brettChapman avatar May 27 '21 05:05 brettChapman

Hi-

Apologies for the delayed response.

Hub genes are defined by the role they play in a gene-interaction network, with the basic idea that a hub gene is one that is "central" or so "essential" to the network that it greatly alters the network structure (or breaks the network) when removed. Often, these are genes that form connections across modules, so rankings based solely on correlations are not sufficient to identify them. Typical approaches to identifying hub genes involve 3(4) steps 1) estimate the gene-interaction network, 2) for each network node (gene) calculate various measures of centrality (e.g., connectivity, betweenness, etc), 3) rank nodes (genes) by centrality measures, with top-ranked being most likely to be "hub" genes. 4) validate prediction experimentally :)

Both WGCNA and iterativeWGCNA can be used for step # 1 -- estimating the gene interaction network. Using iterativeWGCNA will eliminate the need for some of the filtering steps you mention (e.g., remove top 10% by KME, filtering for top 75% expressed genes -- BTW never a good idea b/c often essential regulatory genes such as transcription factors are not highly expressed) b/c it is optimized to remove spurious connections and produces highly refined, connected modules, where every gene in a module is very highly connected (highly correlated to all other genes in the module). So using iterativeWGCNA will help you produce a more refined estimate of the gene-interaction network than standard WGCNA.

Thank being said, determining the hub genes is outside the scope of iterativeWGCNA, whose goal is just to help produce high quality gene-interaction networks.

Once you finish running iterativeWGCNA, you have several options on how to proceed to identify your hub genes.

  • use standard WGCNA

    • follow the recommendations in https://github.com/cstoeckert/iterativeWGCNA/issues/30#issuecomment-507730370 for filtering the expression data and then passing the filtered expression data and the module assignments to WGCNA's chooseTopHubInEachModule function
  • use igraph or a similar generic network analysis toolkit to get at the same question w/a bit more flexibility to define what you consider to be a hub gene

    • follow the steps in https://github.com/cstoeckert/iterativeWGCNA/issues/30#issuecomment-507730370 to use WGCNA to calculate the refined topological overlap matrix and then transform into an network using igraph.
    • in R import network into igraph and use igraph's clustering / connectivity functions to identify / rank essential nodes
      • NOTE: you can import module assignments; it may take you a bit of trial and error
    • or ... export and pass to another tool for analysis

fossilfriend avatar Jun 02 '21 16:06 fossilfriend

Thanks for your feedback @fossilfriend

I'll try out the methods you mention and see how I go. I've also come across tools such as DeepGraph (https://deepgraph.readthedocs.io/en/latest/) and Networkit (https://networkit.github.io/), which I'm keen to try out as well. I have my own force directed network visualisations based on D3.js, which I'm keen to pass in the raw data that iGraph uses for visualisation.

I think for a first attempt I'll run WGCNA with a filtered dataset based on the iterativeWGCNA ouput, identify top hub genes, and also try running iGraph. Thanks for your help.

brettChapman avatar Jun 03 '21 02:06 brettChapman