pertpy icon indicating copy to clipboard operation
pertpy copied to clipboard

OverflowError: cannot convert float infinity to integer with ms.perturbation_signature Method in Mixscape

Open Zethson opened this issue 1 year ago • 1 comments

Discussed in https://github.com/theislab/pertpy/discussions/604

Originally posted by benayedi May 23, 2024 Hi!

I'm encountering an issue when trying to run the ms.perturbation_signature method on my AnnData object. Specifically, I'm using the following command:

ms.perturbation_signature(adata, "gene.compact", "0", "replicate")

gene.compact represents the actual gene. replicate is the preprocessed column from guide.compact where it represents the four guides, and 0 is for unassigned guide. Here is the value count for the replicate column:

print(adata.obs["replicate"].value_counts())

replicate
0    15903
3     5724
1     5606
4     4875
2     4163
Name: count, dtype: int64

The command results in the following error:

/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pynndescent/pynndescent_.py:703: RuntimeWarning: divide by zero encountered in log2
  n_iters = max(5, int(round(np.log2(data.shape[0]))))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
Cell In[15], line 1
----> 1 ms.perturbation_signature(adata, "gene.compact", "unassigned", "replicate")

File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pertpy/tools/_mixscape.py:110, in Mixscape.perturbation_signature(self, adata, pert_key, control, split_by, n_neighbors, use_rep, n_pcs, batch_size, copy, **kwargs)
    107 from pynndescent import NNDescent
    109 eps = kwargs.pop("epsilon", 0.1)
--> 110 nn_index = NNDescent(R_control, **kwargs)
    111 indices, _ = nn_index.query(R_split, k=n_neighbors, epsilon=eps)
    113 X_control = np.expm1(adata.X[control_mask_split])

File /Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pynndescent/pynndescent_.py:703, in NNDescent.__init__(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, max_rptree_depth, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose)
    701     n_trees = min(32, n_trees)  # Only so many trees are useful
    702 if n_iters is None:
--> 703     n_iters = max(5, int(round(np.log2(data.shape[0]))))
    705 self.n_trees = n_trees
    706 self.n_trees after update = max(1, int(np.round(self.n_trees / 3)))

OverflowError: cannot convert float infinity to integer

I have ensured that the data does not include any NaN values. Interestingly, the following command works without any issues:

ms.perturbation_signature(adata, "gene.compact", "unassigned", "batch")

but this one causes the same error above:

ms.perturbation_signature(adata, "gene.compact", "unassigned", "guide.compact")

Here are the value counts for the batch column:

batch
0     3495
1     2601
2     2529
4     2516
3     2474
7     2434
14    2432
13    2314
8     2306
5     2270
10    2239
9     2216
11    2215
12    2147
6     2083
Name: count, dtype: int64

Does anyone know how to resolve this issue with the replicate column? Any insights would be greatly appreciated.

Thank you!

Zethson avatar May 23 '24 18:05 Zethson

@benayedi you can subscribe to this issue. I closed the discussions

Zethson avatar May 23 '24 18:05 Zethson

@benayedi thank you very much for the detailed issue report. Would it be possible for you to share your object, please?

Zethson avatar May 30 '24 05:05 Zethson