scarches
scarches copied to clipboard
Label transfer code producing different outputs in different environments
Hi,
As discussed with @alextopalova and @M0hammadL , the label transfer code that you guys recently added to the scArches code base produces different output depending on (I think) the sklearn version. On top of that, given a specific sklearn version, the output of the isolated label transfer function is different depending on whether or not scArches is loaded in the background or not.
As access to our code was temporarily shut off, I cannot post the exact examples here, but I think @alextopalova might have a code example.
@LisaSikkema @alextopalova pls add an example, i will try to investigate.
This code:
#import scarches
import scanpy as sc
from sklearn.neighbors import KNeighborsTransformer
train_adata = sc.read_h5ad('adata_ref.h5ad')
query_adata = sc.read_h5ad('adata_query_latent.h5ad')
k_neighbors_transformer = KNeighborsTransformer(
n_neighbors=50,
mode="distance",
algorithm="brute",
metric="euclidean",
n_jobs=-1,
)
train_emb = train_adata.X
k_neighbors_transformer.fit(train_emb)
query_emb = query_adata.X
top_k_distances, top_k_indices = k_neighbors_transformer.kneighbors(X=query_emb)
results in top_k_distances
being:
array([[1.41037903, 1.46031747, 1.56667092, ..., 1.97135402, 1.97546332,
1.97644941],
[1.73469417, 1.8243846 , 1.84583178, ..., 2.15679748, 2.15960653,
2.16063995],
[1.68019217, 1.7671486 , 1.88269087, ..., 2.37781288, 2.37799265,
2.37863604],
...,
[1.75822227, 1.76119426, 1.76151872, ..., 2.13874144, 2.13952397,
2.14402001],
[1.98569565, 1.98782103, 1.99650387, ..., 2.26439439, 2.2671816 ,
2.26878032],
[1.80560973, 1.87017972, 1.96924954, ..., 2.20633566, 2.20645269,
2.20916245]])
and top_k_indices
being:
array([[416773, 571474, 151261, ..., 322724, 424630, 499221],
[251611, 416773, 518922, ..., 484956, 547908, 322724],
[484956, 172174, 518922, ..., 156024, 315468, 62600],
...,
[240861, 126917, 468156, ..., 117676, 491559, 39352],
[ 76544, 14914, 219480, ..., 498554, 341286, 258244],
[375969, 301018, 103043, ..., 254120, 334796, 558764]])
However, once scarches gets imported (the first line gets uncommented) top_k_distances
becomes:
array([[0., 0., 0., 0., 0., 0., 0., ..., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., ..., 0., 0., 0., 0., 0., 0., 0.],
...,
[0., 0., 0., 0., 0., 0., 0., ..., 0., 0., 0., 0., 0., 0., 0.]])
and top_k_indices
this:
array([[256306, 112, 245, 67, 256453, 179, 197, ...,
256368, 256323, 236, 248, 70, 80, 139],
[219760, 219682, 219693, 219736, 219870, 219845, 219790, ...,
34, 219873, 67, 12, 75, 45, 219761],
[ 212, 219682, 219893, 51, 32, 219758, 219851, ...,
61, 166, 146, 50, 142, 75, 45],
[219827, 219682, 110, 219893, 113, 219715, 67, ...,
70, 45, 179, 219758, 80, 75, 139],
[219860, 219682, 116, 45, 245, 110, 12, ...,
214, 168, 113, 219851, 75, 139, 219715],
[ 245, 219682, 256278, 212, 256453, 139, 34, ...,
75, 256435, 256492, 219907, 61, 50, 112],
[ 166, 219682, 212, 70, 256278, 122, 218, ...,
222, 197, 245, 34, 256268, 256290, 50],
...,
[109950, 219848, 36880, 366180, 276, 476174, 219922, ...,
366350, 73492, 73473, 146501, 439526, 439365, 36677],
[329743, 256324, 73284, 512356, 110126, 219755, 73473, ...,
366337, 36867, 476176, 219748, 476071, 146500, 439526],
[366297, 183052, 146572, 476071, 219903, 109974, 439396, ...,
476112, 293003, 146582, 476054, 36852, 402827, 146658],
[183192, 0, 402786, 256293, 36739, 402805, 109957, ...,
548817, 512366, 219744, 109845, 73288, 548760, 183061],
[366180, 109974, 512417, 36659, 110081, 219804, 292915, ...,
476071, 146607, 219848, 183091, 476054, 36838, 548676],
[512518, 36869, 329755, 366391, 73519, 366341, 36889, ...,
36906, 36879, 366397, 73499, 219923, 36910, 283],
[ 36638, 0, 219966, 219922, 273, 366364, 73478, ...,
329821, 73486, 110136, 476201, 366341, 366099, 329794]])
This problem happens for scikit-learn version 1.2.1, but doesn't exist with 1.1.3. All the other packages are as suggested in the environment section in the scAcrhes documentation.
@alextopalova Could you also share the data , so i can check myself.
@Koncopd Of course, I uploaded and linked the files here: issue files
Hm, i can't reproduce this problem. What OS do you use? I tried on linux.
import numpy as np
np.random.seed(0)
Could you also check if this helps when added at the very beginning?
I tried the numpy code and it didn't make a difference. I am running the code on WSL 2.
@alextopalova Do you check with scarches master branch? Could you post your conda environment?
Hey @Koncopd @alextopalova , any progress with figuring out where the bug is?
This is as far as I got trying to narrow things down. Seems like the error only happens on our GPU, and only with specific versions of some packages: scarches_bug_notes.xlsx Didn't get any further than that and giving up for the moment, just sticking to latest packages.
Oh and most bizarre part: error only happens for me when I launch my jupyter via an sbatch script and run the code via Juputer notebook/lab, not if I run it in terminal from python, or start the Jupyter notebook directly from terminal without sbatch script in between.