pybedtools icon indicating copy to clipboard operation
pybedtools copied to clipboard

'generator raised StopIteration' error when running 'randomstats' with multiple processes

Open tparket opened this issue 3 years ago • 4 comments

Hi,

First of all - thank you for your amazing work. pybedtools has been super useful for my research so far and I am very grateful.

I'm trying to run 'randomstats' with the following args:

results_dict = a.randomstats(b, iterations=1000, new=True, genome_fn=chromsizes_fn, processes=4, shuffle_kwargs={"chrom": True}, intersect_kwargs={"f": 1})

`--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) ~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in parallel_apply(self, iterations, func, func_args, func_kwargs, processes, _orig_pool) 2932 for it in range(iterations): -> 2933 yield func(*func_args, **func_kwargs) 2934 raise StopIteration

~/.local/lib/python3.7/site-packages/pybedtools/stats.py in random_intersection(x, y, genome_fn, shuffle_kwargs, intersect_kwargs) 16 result = len(zz) ---> 17 helpers.close_or_delete(z, zz) 18 return result

~/.local/lib/python3.7/site-packages/pybedtools/helpers.py in close_or_delete(*args) 547 if hasattr(x.fn, "throw"): --> 548 x.fn.throw(StopIteration) 549

StopIteration:

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last) in

~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in randomstats(self, other, iterations, new, genome_fn, include_distribution, **kwargs) 2846 ) 2847 distribution = self._randomintersection( -> 2848 other, iterations=iterations, genome_fn=genome_fn, **kwargs 2849 ) 2850

~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in _randomintersection(self, other, iterations, genome_fn, intersect_kwargs, _orig_pool, shuffle_kwargs, processes) 3038 ), 3039 processes=processes, -> 3040 _orig_pool=_orig_pool, 3041 ) 3042 )

RuntimeError: generator raised StopIteration`

The thing is that when I remove the 'processes' argument the 'randomstats' works just fine, but everytime I try to run it with 'processes' (even with a value of 1), I get the aformentioned error.

Other relevant data:

  • 'a' and 'b' are both bedtool objects generated from a df. A regular a.intersect(b, f=1) works perfectly.
  • 'chromsizes_fn' is the name of a genome file generated from a dict with: chromsizes_fn = pybedtools.chromsizes_to_file(chromsizes_dic, fn=temp_genome.name) I tried using both fn=False and fn=temp_genome.name
  • I tried to run it with both new=True and without it. It crashed on both tries.

I would really appreciate your help. I'm planning to run 'randomstats' on a large number of files, with at least 1000 iterations for each time, and being able to use multiprocessing will make it feasible.

tparket avatar Nov 20 '22 04:11 tparket

Great to hear you find pybedtools useful.

Can you provide an example of the files you're using for a and b so I can test locally?

daler avatar Nov 20 '22 19:11 daler

Thanks for getting back to me so soon. Please find the files* attached.

Archive.zip

*these are not the original files, but randomly generated intervals. Nevertheless, I’m getting the same errors.

tparket avatar Nov 21 '22 10:11 tparket

I'm getting the same error. Have there been any updates to fix this issue?

igoronzy avatar Aug 08 '23 01:08 igoronzy

Bumping this. Might be a Python versioning issue.

Prior to Python 3.7, the StopIteration raised by the generator (parallel_apply()) would have just signaled the end of the iteration. Starting in Python 3.7, a StopIteration raised by a generator is converted into a RuntimeError: see https://docs.python.org/3/library/exceptions.html#StopIteration

A workaround that seems to work for now is to:

  1. Comment out these 2 lines in the close_or_delete() function from helpers.py

    if hasattr(x.fn, "throw"):
        x.fn.throw(StopIteration)
    
  2. Replace the 2 instances (here and here) of raise StopIteration in BedTool.parallel_apply() with a simple return.

I'm happy to submit a pull request, but this may be part of a larger issue of dealing with Python versions in pybedtools.

bentyeh avatar Aug 09 '23 03:08 bentyeh