pybedtools icon indicating copy to clipboard operation
pybedtools copied to clipboard

randomstats not cleaning up

Open brentp opened this issue 13 years ago • 9 comments

I dont know why this is happening, because a look at the code shows that it is calling close_or_delete but randomstats is leaving a ton of pybedtools.tmp* files in my tmp dir. and calling cleanup() does not remove them. Perhaps what's getting sent to close_or_delete is a filehandle? I've tried calling randomstats with an object and with object.fn and it never cleans up the files.

My call looks like this:

        res = bed.randomstats(loh.fn, 100, processes=25)

brentp avatar Nov 30 '12 18:11 brentp

A while ago I did a major overhaul on the randomization stuff, implementing a new method (BedTool._randomintersection rather than BedTool.randomintersection) that fixed this.

Looks like I never made this method the default for BedTool.randomstats().

To use the new method, you can specify new=True and provide a genome_fn to BedTool.randomstats. To see the difference (both in syntax and cluttering of the temp dir), check out test/prevent_open_file_regression.

So for your example, this should do the trick:

gfn = pybedtools.chromsizes_to_file(pybedtools.chromsizes('hg19'))
res = bed.randomstats(loh.fn, 100, processes=25, new=True, genome_fn=gfn)

(side note: If you take a look at the leftover temp files, I think they should all be genome files)

daler avatar Nov 30 '12 21:11 daler

that does the trick. can genome_fn be a required argument to avoid this?

brentp avatar Dec 04 '12 14:12 brentp

Yeah, that's probably best. I still need to do a little more cleaning up and "officially" deprecate the old randomstats method; when that happens the genome_fn will be required.

daler avatar Dec 04 '12 14:12 daler

got it.

would you consider adding _orig_pool kwag to random_op. it'd be nice be able to keep re-using a pool if I'm running this across multiple pairs of bed files.

brentp avatar Dec 04 '12 14:12 brentp

Sure.

Implementation-wise, would you rather create your own pool and use it for various parallel calls like

mypool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=mypool, *args, **kwargs)
bt.random_op(_orig_pool=mypool, *args, **kwargs)
bt.random_jaccard(_orig_pool=mypool, *args, **kwargs)

or have a BedTool._pool instance variable that, if None, will initialize with n processes, but subsequent calls (when _orig_pool=True) re-use that auto-created one?

# initializes a pool, BedTool._pool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=True, processes=25, *args, **kwargs)

# subsequent calls re-use BedTool._pool
bt.randomstats(_orig_pool=True, processes=25, *args, **kwargs)

# set to None to re-initialize w/ different nprocs
bt._pool = None
bt.randomstats(_orig_pool=True, processes=500, *args, **kwargs)

daler avatar Dec 04 '12 15:12 daler

I much prefer the former.

brentp avatar Dec 04 '12 15:12 brentp

sorry for putting this in this thread, but it's another open file error. if i stream, it must be leaving open the process?

from pybedtools import BedTool

a = BedTool('chr1 1 2', from_string=True)
b = BedTool('chr1 1 2', from_string=True)

for i in range(10000):
    print i
    c = a.intersect(b, stream=True)

is that expected to leak?

brentp avatar Dec 04 '12 23:12 brentp

In this case, I think the answer is yes:

The way streaming bedtools are closed is by hitting a StopIteration (see cbedtools.IntervalIterator). Since c in this example is never iterated over, it never gets a chance to raise a StopIteration to close the stream.

But it would be nice if the garbage collector saw that the streaming BedTool from iteration i-1 no longer has any references, and cleans it up (would a __del__ method be called then?). But this starts to get to the reference counting part of Python & Cython that I don't have a handle on yet. Any ideas?

daler avatar Dec 05 '12 14:12 daler

i tried a number of things including __del__, but can't get it to work. it doesn't collect them until the program terminates... Streaming over the results does prevent the error in this case. I'm getting another file handles open error that I haven't been able to create a small test-case for..

brentp avatar Dec 05 '12 17:12 brentp