MultiVelo icon indicating copy to clipboard operation
MultiVelo copied to clipboard

Integrating several samples, vanishing genes :)

Open Dalhte opened this issue 10 months ago • 4 comments

Hi there. First, thanks for this very nice tool :) I have a small probleme (it's probably trivial but I'm very new to bioinformatic analysis): I have 4 differents 10X multiom ATAC+RNA samples : PG2, PG6 PG24 and PG13. I integrated those using seurat/signac first then I tried to run Multivelo I treated the different samples separately for the preprocessing steps : For example for PG2 :

adata_atacPG2 = sc.read_10x_mtx('/media/david/F/yard/apps/cellranger-arc-2.0.2/PG2/outs/filtered_feature_bc_matrix/', var_names='gene_symbols', cache=True, gex_only=False) adata_atacPG2 = adata_atacPG2[:,adata_atacPG2.var['feature_types'] == "Peaks"] adata_atacPG2 = mv.aggregate_peaks_10x(adata_atacPG2,'/media/david/F/yard/apps/cellranger-arc-2.0.2/PG2/outs/atac_peak_annotation.tsv', '/media/david/F/yard/apps/cellranger-arc-2.0.2/PG2/outs/analysis/feature_linkage/feature_linkage.bedpe',verbose=True) mv.tfidf_norm(adata_atacPG2)

I renamed the cells with unique barcodes:

barcodes = adata_atacPG2.obs.index barcodesnew = ['PG2_' + bc[0:len(bc)-2] for bc in barcodes] adata_atacPG2.obs.index = barcodesnew

Having done that on the four samples, I generated a single object by concatenation :

adata_atacPG = adata_atacPG2.concatenate([adata_atacPG6, adata_atacPG24, adata_atacPG13])

Then I processed the RNA and so one. Everything seems to work very nicely BUT I lose some of the genes (and some important ones that is). After investigation, I realized that these genes were lost during the concatenation step, probably because these are specifically present in some of the adata_atacPGXX objects but not in every ones.

How may I tackle this problem ?

Best David

Dalhte avatar Aug 11 '23 08:08 Dalhte