NextGenMap icon indicating copy to clipboard operation
NextGenMap copied to clipboard

Support / checking for long lines in reference fasta

Open roblanf opened this issue 7 years ago • 1 comments

Love the software. But it took me noticing a few odd things in my results before I went and read all the docs carefully, and then I found this listed (very clearly) in open issues on the github README:

The length of a line in a input FASTA file must not exceed 4096 bp.

It would be great if you could fix this, so that it would read any length of input lines from a reference.fa file. Failing that, checking whether the reference will be truncated and spitting an error should presumably be just a couple of lines of code.

Right now, I find the behaviour a little troubling: NextGenMap ran perfectly well on my data, and it wasn't until I was looking at the output that I realised something must be up (a lot of the genome had 0 mapping quality). To me, this has the potential to cause inferential issues to users (admittedly, users who don't look carefully at their output... but we know they exist) who aren't aware the issue exists. A simple error & quit, or just fixing the issue (even by reformatting the reference.fa to the format you need) should both be pretty simple, and might help avoid issues for users.

roblanf avatar Mar 03 '17 03:03 roblanf

Hi Rob!

Thank you very much for the feedback! Yes you are right, truncating the reference without a warning or an error is definitely not the optimal way to handle this. We will address this in the next release!

Best, Philipp

philres avatar Mar 03 '17 18:03 philres