bedtools
bedtools copied to clipboard
ENH: implement new tool bedtools sample
This would be useful for stuff related to significance of overlap (along with shuffle).
this would have a command-line interface like:
bedtools sample -i stdin -n 500 > out.500.bed
This could be done with sort -R or nl, or something, but those tools are not on most systems that I work with.
This is very easy to implement with reservoir sampling, here;s a python version mostly taken from wikipedia:
import random
lines = []
for i, line in enumerate(bed):
if i < n:
lines.append(line)
else:
replace_idx = random.randint(0, i)
if replace_idx < n:
lines[replace_idx] = line
print "".join(lines),
Below may be a completely separate tool, but something that would be useful to add to a suite like this in bedtools....
it may also be nice to do shuffling of labels, e.g.
bedtools label_shuffle -col 4 -i stdin
so instead of shuffling positions, we shuffle the labels of the appropriate column. Again, this can be useful for significance testing.