hdWGCNA icon indicating copy to clipboard operation
hdWGCNA copied to clipboard

TOM would not update

Open DelongZHOU opened this issue 10 months ago • 0 comments

Hello Sam,

As part of my differential connectivity experiment I'm shuffling cell condition label and remaking the TOM for the FDR. Despite I'm setting random seed for each iteration, shuffling cell condition, removing saved TOM files and clearing memory, more often than not the TOM files are not updated. Here's the part of my code that got involved:

#function to create TOM file
create_TOM_cond<-function(cond) {
	#subset the master Seurat object by condition
	seurat_obj_cond <-  subset(combined,subset=condition==cond)
	print(seurat_obj_cond) #check cell count
	seurat_obj_cond <- SetupForWGCNA(
	  seurat_obj_cond,
	  gene_select = "custom", # the gene selection approach to use all genes in the base tom file
	  features = base_genes, # fraction of cells that a gene needs to be expressed in order to be included
	  wgcna_name = cond # the name of the hdWGCNA experiment
	)

	# construct metacells  in each group
	seurat_obj_cond <- MetacellsByGroups(
	  seurat_obj = seurat_obj_cond,
	  group.by = c("subtype", "condition"), # specify the columns in [email protected] to group by
	  reduction = 'pca', # select the dimensionality reduction to perform KNN on
	  k = 25, # nearest-neighbors parameter
	  max_shared = 10, # maximum number of shared cells between two metacells
	  ident.group = 'subtype' # set the Idents of the metacell seurat object
	)
	# normalize metacell expression matrix:
	seurat_obj_cond <- NormalizeMetacells(seurat_obj_cond)
	# Set up the expression matrix
	seurat_obj_cond <- SetDatExpr(
	  seurat_obj_cond,
	  group_name = celltype, # the name of the group of interest in the group.by column
	  group.by='subtype', # the metadata column containing the cell type info. This same column should have also been used in MetacellsByGroups
	  assay = 'RNA', # using RNA assay
	  slot = 'data' # using normalized data
	)
	# Test different soft powers:
	#disable the multithread for this step.
	#Occasionally multithread will cause the script to stuck in the block for unknown reasons.

	seurat_obj_cond <- TestSoftPowers(
	  seurat_obj_cond,
	  networkType = 'signed' # you can also use "unsigned" or "signed hybrid"
	)

	# construct co-expression network:
	seurat_obj_cond <- ConstructNetwork(
	  seurat_obj_cond, 
	  setDatExpr=FALSE,
	  tom_name = paste0(celltype,'.',cond), # name of the topoligical overlap matrix written to disk
	  overwrite_tom = TRUE
	)
	TOM <- GetTOM(seurat_obj_cond)
	return (TOM)
}
#save condition column to true.condition
combined[['true.condition']]<-combined[['condition']]
#extract condition labels
cdt=combined$condition
#remove name to break link with cell tag
names(cdt)<-NULL

#function to shuffle cell condition
produce_shuffle_cell_scores<-function(i) {
	#remove previous TOM files
	unlink("TOM", recursive = TRUE)
	set.seed(i)
	shuffle_cdt<-sample(cdt) #shuffle condition label
	print(shuffle_cdt[1:10]) #make sure condition label are shuffled
	combined[['condition']]<-shuffle_cdt #replace condition label by shuffled label
	#double check shuffled condition distribution
	print(table(combined[[c('condition','true.condition')]])) #each seed produce different count table
	TOMs_shuffle<-map(conditions,create_TOM_cond)
	for (TOM_shuffle in TOMs_shuffle) { print(TOM_shuffle[1:5,1:5]) } #often this produces the same TOM matrix so same the module scores later
	names(TOMs_shuffle)<-conditions
	module_scores<-map(TOMs_shuffle, function(TOM_shuffle) (map_vec(modules,function(module) extract_module_score(module,TOM_shuffle))))
	#reformat module_scores
	names(module_scores)<-conditions
	module_scores<-as.data.frame(module_scores)
	module_scores[['module']]<-rownames(module_scores)
	module_scores[['loop_index']]<-i
	#clear memory
	rm(TOMs_shuffle)
	rm(TOM_shuffle)
	return (module_scores)
}
#check the module score
produce_shuffle_cell_scores(1)
produce_shuffle_cell_scores(2)

Sometimes both call of shuffled cells return the same as unshuffled table, sometimes the seed 1 is successful but seed 2 returns the same as seed 1 despite the table of condition/true condition indicates the cell distribution has changed.

I know that Seurat's subset function reset the seed, but I set the seed and immediately shuffle labels before calling the subset and confirm the shuffling by the condition/true condition table.

It's also inconsistant when the TOM gets updated. Sometimes I manage to get it update manually running the create_TOM_cond function (other times that fails too, I don't understand why), but not in the map function; sometimes the map works but won't work inside the score function, Some times the score function works when ran interactively, but not as part of Rscript <>.r

I'm totally at loss what might be the cause. Would you have an idea? Thanks!

Best, Delong

DelongZHOU avatar Apr 10 '24 20:04 DelongZHOU