hdWGCNA
hdWGCNA copied to clipboard
TOM would not update
Hello Sam,
As part of my differential connectivity experiment I'm shuffling cell condition label and remaking the TOM for the FDR. Despite I'm setting random seed for each iteration, shuffling cell condition, removing saved TOM files and clearing memory, more often than not the TOM files are not updated. Here's the part of my code that got involved:
#function to create TOM file
create_TOM_cond<-function(cond) {
#subset the master Seurat object by condition
seurat_obj_cond <- subset(combined,subset=condition==cond)
print(seurat_obj_cond) #check cell count
seurat_obj_cond <- SetupForWGCNA(
seurat_obj_cond,
gene_select = "custom", # the gene selection approach to use all genes in the base tom file
features = base_genes, # fraction of cells that a gene needs to be expressed in order to be included
wgcna_name = cond # the name of the hdWGCNA experiment
)
# construct metacells in each group
seurat_obj_cond <- MetacellsByGroups(
seurat_obj = seurat_obj_cond,
group.by = c("subtype", "condition"), # specify the columns in [email protected] to group by
reduction = 'pca', # select the dimensionality reduction to perform KNN on
k = 25, # nearest-neighbors parameter
max_shared = 10, # maximum number of shared cells between two metacells
ident.group = 'subtype' # set the Idents of the metacell seurat object
)
# normalize metacell expression matrix:
seurat_obj_cond <- NormalizeMetacells(seurat_obj_cond)
# Set up the expression matrix
seurat_obj_cond <- SetDatExpr(
seurat_obj_cond,
group_name = celltype, # the name of the group of interest in the group.by column
group.by='subtype', # the metadata column containing the cell type info. This same column should have also been used in MetacellsByGroups
assay = 'RNA', # using RNA assay
slot = 'data' # using normalized data
)
# Test different soft powers:
#disable the multithread for this step.
#Occasionally multithread will cause the script to stuck in the block for unknown reasons.
seurat_obj_cond <- TestSoftPowers(
seurat_obj_cond,
networkType = 'signed' # you can also use "unsigned" or "signed hybrid"
)
# construct co-expression network:
seurat_obj_cond <- ConstructNetwork(
seurat_obj_cond,
setDatExpr=FALSE,
tom_name = paste0(celltype,'.',cond), # name of the topoligical overlap matrix written to disk
overwrite_tom = TRUE
)
TOM <- GetTOM(seurat_obj_cond)
return (TOM)
}
#save condition column to true.condition
combined[['true.condition']]<-combined[['condition']]
#extract condition labels
cdt=combined$condition
#remove name to break link with cell tag
names(cdt)<-NULL
#function to shuffle cell condition
produce_shuffle_cell_scores<-function(i) {
#remove previous TOM files
unlink("TOM", recursive = TRUE)
set.seed(i)
shuffle_cdt<-sample(cdt) #shuffle condition label
print(shuffle_cdt[1:10]) #make sure condition label are shuffled
combined[['condition']]<-shuffle_cdt #replace condition label by shuffled label
#double check shuffled condition distribution
print(table(combined[[c('condition','true.condition')]])) #each seed produce different count table
TOMs_shuffle<-map(conditions,create_TOM_cond)
for (TOM_shuffle in TOMs_shuffle) { print(TOM_shuffle[1:5,1:5]) } #often this produces the same TOM matrix so same the module scores later
names(TOMs_shuffle)<-conditions
module_scores<-map(TOMs_shuffle, function(TOM_shuffle) (map_vec(modules,function(module) extract_module_score(module,TOM_shuffle))))
#reformat module_scores
names(module_scores)<-conditions
module_scores<-as.data.frame(module_scores)
module_scores[['module']]<-rownames(module_scores)
module_scores[['loop_index']]<-i
#clear memory
rm(TOMs_shuffle)
rm(TOM_shuffle)
return (module_scores)
}
#check the module score
produce_shuffle_cell_scores(1)
produce_shuffle_cell_scores(2)
Sometimes both call of shuffled cells return the same as unshuffled table, sometimes the seed 1 is successful but seed 2 returns the same as seed 1 despite the table of condition/true condition indicates the cell distribution has changed.
I know that Seurat's subset function reset the seed, but I set the seed and immediately shuffle labels before calling the subset and confirm the shuffling by the condition/true condition table.
It's also inconsistant when the TOM gets updated. Sometimes I manage to get it update manually running the create_TOM_cond function (other times that fails too, I don't understand why), but not in the map function; sometimes the map works but won't work inside the score function, Some times the score function works when ran interactively, but not as part of Rscript <>.r
I'm totally at loss what might be the cause. Would you have an idea? Thanks!
Best, Delong