MultiMAP Different feature dim numbers after PCA in example script?

Different feature dim numbers after PCA in example script?

Open Y-SHI-MxLucid opened this issue 2 years ago • 1 comments

Hi there,

Have read the preprint very nice one.

I am trying to run the example script in the project, and I found that, the input of MultiMAP.integration:

adata = MultiMAP.Integration([rna, atac_genes], ['X_pca', 'X_lsi'])

rna.obsm['X_pca'] has the dim (4382, 50) while atac_genes.obsm['X_lsi'] has the dim (3166, 49). atac_genes.obsm['X_lsi'] is the output of MultiMAP.TFIDF_LSI() in init.py and MultiMAP.TFIDF_LSI() called tfidf() in matrix.py

MultiMAP.TFIDF_LSI(atac_peaks)
atac_genes.obsm['X_lsi'] = atac_peaks.obsm['X_lsi'].copy()

I later checked in matrix and I think the dim number = 49 might due to the discarding of the first column of the sklearn.decomposition.TruncatedSVD() output?

# n_components passed to here is 50
def tfidf(X, n_components, binarize=True, random_state=0):
    from sklearn.feature_extraction.text import TfidfTransformer
    sc_count = np.copy(X)
    if binarize:
        sc_count = np.where(sc_count < 1, sc_count, 1)
    tfidf = TfidfTransformer(norm='l2', sublinear_tf=True)
    normed_count = tfidf.fit_transform(sc_count)
    lsi = sklearn.decomposition.TruncatedSVD(n_components=n_components, random_state=random_state)
    lsi_r = lsi.fit_transform(normed_count)
    # Here↓↓↓↓
    X_lsi = lsi_r[:, 1:]
    return X_lsi

I wonder is the discarding of the column #0 is to remove the PC1 which usually strongly correlated to sequencing depth? In this way, the 2 inputs of MultiMAP.Integration() has PCA dim of 50 and 49 respectively although the function still runs normally and returns a result with dim (7548, 2), but, is that okay to do so? I have an impression reading the preprint that the 2 dataset to be integrated should have the same PC dim number after PCA reduction, because the inter-dataset point distance need to be calculated. Please could you correct me if my understanding is wrong.

Nov 08 '21 17:11 Y-SHI-MxLucid

MultiMAP MultiMAP copied to clipboard

Different feature dim numbers after PCA in example script?

MultiMAP
MultiMAP copied to clipboard