kraken
kraken copied to clipboard
Fixed human genome downloading and added auto-masking feature using dustmasker
Hi Derrick,
I've been using Kraken as of late, and decided to incorporate some fixes. First, I incorporated a suggested fix by Silas Kieser for downloading the human genome, and simplified it to use a single wget call:
https://groups.google.com/d/msg/kraken-users/wMNMSPo8Xtw/osYcrx90DgAJ
Next, I incorporated Adam Rivers' suggestion about how to use dustmasker to mask low-complexity regions into kraken-build, along with adding an option '--no-mask' to turn off masking if desired for reproducibility. The software reverts to no masking if dustmasker is not found.
https://groups.google.com/d/msg/kraken-users/jjRe21-qyvw/Kq8DXY45CQAJ
I also updated the documentation to reflect the new soft dependency on dustmasker, and documented the --no-mask option.
Please let me know what you think!
Cheers,
~Tomer
Nice! Thanks Tomer - we use dustmasker ourselves in our pipeline, but we hadn't built it into Kraken as an option. I highly recommend 'dust'-ing any genomes before running Kraken (or any competing program) because of the confusion caused by low-complexity sequences.
Thanks for your feedback, Dr. Salzberg! As I've communicated to Derrick, Kraken was instrumental in me finishing my PhD, so I'm happy to be able to contribute back. Please let me know if you think that one of the Kraken devs will accept this pull request.
I'm curious about your thoughts on masking before or after mapping. I see that Heng Li advises masking after read mapping, and discarding reads that land in masked regions. Is this what you advise with Bowtie2, or do you mask DNA before building a Bowtie2 database? Thanks in advance!
https://www.biostars.org/p/170435/#170450
@SheaML Right you are! I did have a commit locally that I failed to push. Resolved. Thanks for catching that and reporting it!