epic
epic copied to clipboard
Bam support
I think it is a bad idea for the below reasons. Feel free to suggest solutions:
You will probably rerun the analyses many times. Having to run a time-consuming conversion step (the most time-consuming one in the algorithm) each time would be silly. It is also IO-intensive so parallell execution would not help much.
I am not just writing epic but a lot of helper scripts for ChIP-Seq and differential ChIP Seq. Adding a conversion step to bed in all of these before running the scripts would be a waste.
Also, where should I store the temporary bed files? Overflowing /tmp/ dirs is an eternal issue.
If I were to stream the data to bed using pipes, epic would not be fast anymore. I get a massive speedup from multiple cores if I use text files, presumably because the system knows it has the file in memory already. This is not the case if I start the pipe with bamToBed blabla | ...
There are many things that can go wrong when converting bam to bed, due to wonky bam files. I would get a bunch of github issues about "epic not being able to use my bam files" if I were to silently convert to bed within my programs.
I guess the best way of adding bam support would be to do the conversion before running the script with a warning that I think using bams instead of beds is suboptimal. If the conversion fails I'll throw an exception informing the user that the onus is on them to convert their wonky bam-files to bed.
My solution: if the input files are called path/to/file.bam
, create a file path/to/file.bed
. Do not delete it afterwards.