ncls icon indicating copy to clipboard operation
ncls copied to clipboard

Can I use ncls to calculate the intersection/union number between ranges?

Open Runsheng opened this issue 7 years ago • 4 comments

Is there any method to return the intersection and union between two range in ncls? For instance, range(1,10) and range(5, 15) would return (5,10) and (1,15).

Or just simply return the length of intersection and union like the bedtools jaccards? [https://bedtools.readthedocs.io/en/latest/content/tools/jaccard.html]

Runsheng avatar Aug 18 '18 10:08 Runsheng

I’ll reply more in depth on monday, when I’m back at work :) Pyranges should be able to do this. The repo is on my github :)

endrebak avatar Aug 18 '18 20:08 endrebak

Thank you very much! I am now using pybedtools to calculate the intersection between ranges. However, the intersection matrix between 300 mRNA tracks (each contains around 15 ranges) would cost me 400 seconds in a 32 core server. I will try Pyranges first and give you some feedback.

Runsheng avatar Aug 19 '18 10:08 Runsheng

pyranges is still largely unused. I have passing unittests, but it might still have bugs or not work.

I would also look into this potential error in bedtools jaccard: https://github.com/arq5x/bedtools2/issues/645 Whether it is a bug and whether it matters I dunno' :)

endrebak avatar Aug 20 '18 06:08 endrebak

Also, if you use pybedtools, it is advisable to presort the data first. It is much faster then.

endrebak avatar Aug 20 '18 06:08 endrebak