python-igraph Leiden clustering algorithm crashes on scanpy graph

Describe the bug This is a cross-reference of an existing bug already filed with scanpy developers, https://github.com/scverse/scanpy/issues/2969.

When I run scanpy on Windows 11 with the Leiden clustering algorithm, it freezes with the following error message:

Exception ignored in: <class 'ValueError'>
Traceback (most recent call last):
    File "numpy\random\_generator.pyx", line 622, in numpy.random._generator.Generator.integers
    File "numpy\random\_bounded_integers.pyx", line 2881, in numpy.random._bounded_integers._rand_int32"
ValueError: high is out of bounds for int32

The exception is raised by the C core function GraphBase.community_leiden but it is not clear to me whether the bug is actually in the C core, or rather scanpy or the Python igraph layer feeding incorrect arguments or parameters. I posted it here as I guessed that the igraph devs would be able to identify whether the bug is in igraph or whether scanpy is passing inappropriate arguments to the igraph core routine or layer.

To reproduce Install scanpy on Windows 11 and run the following.

import numpy as np
import anndata as ad
import scanpy as sc

rng = np.random.default_rng()
counts = rng.integers(low=-1000,high=100,size=(100,1000))
counts = np.maximum(counts , 0)
adata = ad.AnnData(counts)
sc.tl.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata,flavor='igraph',n_iterations=2)

Version information Which version of python-igraph are you using and where did you obtain it? I am using version 0.11.6, it was installed via pip install igraph.

I checked using a Windows docker image to make it as reproducible as possible.

docker run -it python:windowsservercore-1809
Python 3.12.5 (tags/v3.12.5:ff3bc82, Aug  6 2024, 20:45:27) [MSC v.1940 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> import sys
>>> def install(package):
...     subprocess.check_call([sys.executable, "-m", "pip", "install", package])
...
>>> install("scanpy")
(...output suppressed...)
Downloading scanpy-1.10.2-py3-none-any.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 13.6 MB/s eta 0:00:00
Downloading anndata-0.10.9-py3-none-any.whl (128 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 129.0/129.0 kB 7.8 MB/s eta 0:00:00
(... output suppressed. ...)
>>> install("igraph")
Collecting igraph
  Downloading igraph-0.11.6-cp39-abi3-win_amd64.whl.metadata (3.9 kB)
Collecting texttable>=1.6.2 (from igraph)
  Downloading texttable-1.7.0-py2.py3-none-any.whl.metadata (9.8 kB)
Downloading igraph-0.11.6-cp39-abi3-win_amd64.whl (2.0 MB)
   ---------------------------------------- 2.0/2.0 MB 2.7 MB/s eta 0:00:00
Downloading texttable-1.7.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: texttable, igraph
Successfully installed igraph-0.11.6 texttable-1.7.0
Installing collected packages: texttable, igraph
Successfully installed igraph-0.11.6 texttable-1.7.0
>>> import numpy as np
>>> import anndata as ad
>>> import scanpy as sc
>>>
>>> rng = np.random.default_rng()
>>> counts = rng.integers(low=-1000, high=100, size=(100,1000))
>>> counts = np.maximum(counts, 0)
>>> adata = ad.AnnData(counts)
>>> sc.tl.pca(adata)
>>> sc.pp.neighbors(adata)
>>> sc.tl.leiden(adata,flavor='igraph',n_iterations=2)
Exception ignored in: <class 'ValueError'>
Traceback (most recent call last):
  File "numpy\\random\\mtrand.pyx", line 780, in numpy.random.mtrand.RandomState.randint
  File "numpy\\random\\_bounded_integers.pyx", line 2881, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32

These last five lines repeat in a loop until the user terminates the shell with Ctrl-C.

I notice that the igraph wheel downloaded with pip has "cp39" in the filename, which is surprising as this is Python 3.12.

Sep 05 '24 18:09 patrick-nicodemus

Ran into this issue as well, the scanpy function just builds a np.random.RandomState and passes that to igraph.set_random_number_generator. So thinking the issue is in igraph (more where I think this issue is at the bottom). Also, the algorithm converges if you wait long enough for all the messages to print to the output stream... which could take a long time if you don't have a ton of compute resources.

That said, I did the following to get around it:

import numpy as np

class RandomState(np.random.RandomState):
    def randint(self, *args, **kwargs):
        args = list(args)
        args[1] = 2**(32-1)
        return super().randint(*args, **kwargs)
rs = RandomState(np.random.MT19937(np.random.SeedSequence(0)))

Then passed rs into the random_seed argument of scanpy, which is passed to igraph.set_random_number_generator .

Basically, changing the max argument for the random number generator to the max signed int. I think numpy gets the default int bit length from the OS C implementation of long, which I also found is 32 on windows and 64 on linux. I think a newer implementation of numpy resolves this, but does not appear to fix the problem here, at least according to another comment on the related issue opened in scanpy.

Noticed a few other things on the way to this which may help the developers, first RNG_BITS is defined as 32 here and in this line the comment indicates that they are passing randint(0, 2 ^ RNG_BITS-1), which I am wondering if this should be randint(0, 2 ^ (RNG_BITS-1)) since int is signed 32bit in windows numpy? I don't know C so I can't tell if just the comment was misleading or not. That said, this would also indicate why it works on other OSs; since the random generator default data type is int64 vs int32.

Oct 12 '24 22:10 beng1290

I notice that the igraph wheel downloaded with pip has "cp39" in the filename, which is surprising as this is Python 3.12.

That's not a problem -- the igraph wheel is compiled to be compliant with Python's internal ABI3 spec from Python 3.9 upwards, so any Python version from 3.9 upwards should be able to use the same wheel.

Noticed a few other things on the way to this which may help the developers, first RNG_BITS is defined as 32 here and in this line the comment indicates that they are passing randint(0, 2 ^ RNG_BITS-1), which I am wondering if this should be randint(0, 2 ^ (RNG_BITS-1)) since int is signed 32bit in windows numpy?

The comment is parenthesized incorrectly; it should be (2^RNG_BITS) - 1 to avoid ambiguity, but the behaviour of the code is otherwise correct. igraph's C random number generator interface requires a "getter" function that generates exactly RNG_BITS random bits. The way this is translated into Python's random.Random object and objects having an identical interface to random.Random is as follows:

if the object provides a getrandbits() method, we use that. Python's standard Random class and the random module provides this function, so it is being used if you use the default RNG setup with igraph.
if the object does not provide a getrandbits() method, we call randint(0, (2 ** RNG_BITS) - 1) instead. Again, Python's standard Random class and the random module has no problems if you call random.randint(0, (2 ** 32) - 1) -- it returns a random integer as expected.

The problem is that ScanPy is replacing the RNG with a NumPy-based one, under the assumption that it behaves identically to Python's Random instance (which igraph also assumes). Apparently this is not the case on Windows with NumPy. But that's not the only problem with using a numpy.RandomState directly because numpy.RandomState.randint(a, b) treats the upper bound as exclusive while Python's random.Random.randint() treats it as inclusive:

>>> from random import Random
>>> rng = Random()
>>> max(rng.randint(0, 1) for _ in range(1000))
1
>>> from numpy.random import RandomState
>>> rng = RandomState()
>>> max(rng.randint(0, 1) for _ in range(1000))
0

So, all in all, I think that a numpy.RandomState object should not be used directly with igraph.set_random_number_generator() because there are differences in behaviour compared to Python's random.Random object, and we assume the behaviour of random.Random to be valid when calling the methods of the supplied RNG object.

Oct 28 '24 13:10 ntamas

I think a temporary workaround that does not skew the randomness might be this, assuming that sys.maxint is larger than or equal to 2**32:

import numpy as np

class RandomState(np.random.RandomState):
    def getrandbits(self, k: int) -> int:
        return super().tomaxint() & ((1 << k) - 1)

    def randint(self, lo: int, hi: int) -> int:
        return super().randint(lo, hi + 1)

rs = RandomState(np.random.MT19937(np.random.SeedSequence(0)))

A better solution would be to start supporting NumPy random generators directly in python-igraph, in igraphmodule_set_random_generator, based on some isinstance() checks, directing the calls to a different, NumPy-specific implementation if we detect that the RNG being passed in is a NumPy-specific one, but I don't have the resources for implementing this at the moment. I can review a PR if someone is willing to tackle this.

Oct 28 '24 13:10 ntamas

After reading the NumPy docs for numpy.random in more recent versions it looks like a better solution will be probably to use NumPy's low-level bit generators directly. One would need to provide a wrapper class that wraps a NumPy BitGenerator into an object whose interface and behaviour is identical to Python's random.Random.

Oct 28 '24 13:10 ntamas

The set_random_number_generator docs say

the generator to be used. It must be a Python object with at least three attributes: random, randint and gauss. Each of them must be callable and their signature and behaviour must be identical to random.random, random.randint and random.gauss. Optionally, the object can provide a function named getrandbits with a signature identical to randpm.getrandbits that provides a given number of random bits on demand. By default, igraph uses the random module for random number generation, but you can supply your alternative implementation here. If the given generator is None, igraph reverts to the default PCG32 generator implemented in the C layer, which might be slightly faster than calling back to Python for random numbers, but you cannot set its seed or save its state.

I think that the “identical” in this phrase is a steep order:

their signature and behaviour must be identical to random.random, random.randint and random.gauss

if it’s really supposed to be identical, this function could only ever be called with the random module as argument. Nothing else will behave identical to it.

So I agree, @ntamas: I think this function should accept np.random.Generator instances and np.random.RandomState instances, since they are so commonly used. But if not, we (scanpy) will happily pass a shim to igraph – once you defined some concrete stable requirements that are actually actionable (as said, “identical behavior” is so specific that it has to be wrong, otherwise the API would be useless)

Jan 23 '25 10:01 flying-sheep

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

Apr 27 '25 17:04 stale[bot]

Update from our side: Since scanpy 1.10 (https://github.com/scverse/scanpy/pull/2815) we set igraph’s rng to a compatible shim.

As said: igraph should still be updated to accept a np.random.Generator as an alternative to the underspecified “somewhat like the stdlib’s random module” API.

Apr 28 '25 09:04 flying-sheep

python-igraph python-igraph copied to clipboard

Leiden clustering algorithm crashes on scanpy graph

python-igraph
python-igraph copied to clipboard