scarches Label transfer code producing different outputs in different environments

Hi,

As discussed with @alextopalova and @M0hammadL , the label transfer code that you guys recently added to the scArches code base produces different output depending on (I think) the sklearn version. On top of that, given a specific sklearn version, the output of the isolated label transfer function is different depending on whether or not scArches is loaded in the background or not.

As access to our code was temporarily shut off, I cannot post the exact examples here, but I think @alextopalova might have a code example.

Mar 30 '23 13:03 LisaSikkema

@LisaSikkema @alextopalova pls add an example, i will try to investigate.

Mar 30 '23 14:03 Koncopd

This code:

#import scarches
import scanpy as sc
from sklearn.neighbors import KNeighborsTransformer

train_adata = sc.read_h5ad('adata_ref.h5ad')
query_adata = sc.read_h5ad('adata_query_latent.h5ad')

k_neighbors_transformer = KNeighborsTransformer(
    n_neighbors=50,
    mode="distance",
    algorithm="brute",
    metric="euclidean",
    n_jobs=-1,
)

train_emb = train_adata.X
k_neighbors_transformer.fit(train_emb)
query_emb = query_adata.X
top_k_distances, top_k_indices = k_neighbors_transformer.kneighbors(X=query_emb)

results in top_k_distances being:

array([[1.41037903, 1.46031747, 1.56667092, ..., 1.97135402, 1.97546332,
        1.97644941],
       [1.73469417, 1.8243846 , 1.84583178, ..., 2.15679748, 2.15960653,
        2.16063995],
       [1.68019217, 1.7671486 , 1.88269087, ..., 2.37781288, 2.37799265,
        2.37863604],
       ...,
       [1.75822227, 1.76119426, 1.76151872, ..., 2.13874144, 2.13952397,
        2.14402001],
       [1.98569565, 1.98782103, 1.99650387, ..., 2.26439439, 2.2671816 ,
        2.26878032],
       [1.80560973, 1.87017972, 1.96924954, ..., 2.20633566, 2.20645269,
        2.20916245]])

and top_k_indices being:

array([[416773, 571474, 151261, ..., 322724, 424630, 499221],
       [251611, 416773, 518922, ..., 484956, 547908, 322724],
       [484956, 172174, 518922, ..., 156024, 315468,  62600],
       ...,
       [240861, 126917, 468156, ..., 117676, 491559,  39352],
       [ 76544,  14914, 219480, ..., 498554, 341286, 258244],
       [375969, 301018, 103043, ..., 254120, 334796, 558764]])

However, once scarches gets imported (the first line gets uncommented) top_k_distances becomes:

array([[0., 0., 0., 0., 0., 0., 0., ..., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., ..., 0., 0., 0., 0., 0., 0., 0.],
       ...,
       [0., 0., 0., 0., 0., 0., 0., ..., 0., 0., 0., 0., 0., 0., 0.]])

and top_k_indices this:

array([[256306,    112,    245,     67, 256453,    179,    197, ...,
        256368, 256323,    236,    248,     70,     80,    139],
       [219760, 219682, 219693, 219736, 219870, 219845, 219790, ...,
            34, 219873,     67,     12,     75,     45, 219761],
       [   212, 219682, 219893,     51,     32, 219758, 219851, ...,
            61,    166,    146,     50,    142,     75,     45],
       [219827, 219682,    110, 219893,    113, 219715,     67, ...,
            70,     45,    179, 219758,     80,     75,    139],
       [219860, 219682,    116,     45,    245,    110,     12, ...,
           214,    168,    113, 219851,     75,    139, 219715],
       [   245, 219682, 256278,    212, 256453,    139,     34, ...,
            75, 256435, 256492, 219907,     61,     50,    112],
       [   166, 219682,    212,     70, 256278,    122,    218, ...,
           222,    197,    245,     34, 256268, 256290,     50],
       ...,
       [109950, 219848,  36880, 366180,    276, 476174, 219922, ...,
        366350,  73492,  73473, 146501, 439526, 439365,  36677],
       [329743, 256324,  73284, 512356, 110126, 219755,  73473, ...,
        366337,  36867, 476176, 219748, 476071, 146500, 439526],
       [366297, 183052, 146572, 476071, 219903, 109974, 439396, ...,
        476112, 293003, 146582, 476054,  36852, 402827, 146658],
       [183192,      0, 402786, 256293,  36739, 402805, 109957, ...,
        548817, 512366, 219744, 109845,  73288, 548760, 183061],
       [366180, 109974, 512417,  36659, 110081, 219804, 292915, ...,
        476071, 146607, 219848, 183091, 476054,  36838, 548676],
       [512518,  36869, 329755, 366391,  73519, 366341,  36889, ...,
         36906,  36879, 366397,  73499, 219923,  36910,    283],
       [ 36638,      0, 219966, 219922,    273, 366364,  73478, ...,
        329821,  73486, 110136, 476201, 366341, 366099, 329794]])

This problem happens for scikit-learn version 1.2.1, but doesn't exist with 1.1.3. All the other packages are as suggested in the environment section in the scAcrhes documentation.

Mar 30 '23 14:03 alextopalova

@alextopalova Could you also share the data , so i can check myself.

Apr 04 '23 12:04 Koncopd

@Koncopd Of course, I uploaded and linked the files here: issue files

Apr 05 '23 15:04 alextopalova

Hm, i can't reproduce this problem. What OS do you use? I tried on linux.

Apr 06 '23 12:04 Koncopd

import numpy as np
np.random.seed(0)

Could you also check if this helps when added at the very beginning?

Apr 06 '23 13:04 Koncopd

I tried the numpy code and it didn't make a difference. I am running the code on WSL 2.

Apr 11 '23 22:04 alextopalova

@alextopalova Do you check with scarches master branch? Could you post your conda environment?

Apr 13 '23 13:04 Koncopd

Hey @Koncopd @alextopalova , any progress with figuring out where the bug is?

Apr 21 '23 12:04 LisaSikkema

This is as far as I got trying to narrow things down. Seems like the error only happens on our GPU, and only with specific versions of some packages: scarches_bug_notes.xlsx Didn't get any further than that and giving up for the moment, just sticking to latest packages.

Oh and most bizarre part: error only happens for me when I launch my jupyter via an sbatch script and run the code via Juputer notebook/lab, not if I run it in terminal from python, or start the Jupyter notebook directly from terminal without sbatch script in between.

May 29 '23 12:05 LisaSikkema

scarches scarches copied to clipboard

Label transfer code producing different outputs in different environments

scarches
scarches copied to clipboard