FASTK
FASTK copied to clipboard
Misnaming a gzipped input causes a segfault
Granted, this is a stupid mistake on my part. I created a file named sorang.telomeric.fasta but in fact it was a gzipped file. The first two byes of the file are the gzip magic number: 1f 8b.
Running FastK -v -k40 -t1 data/sorang.telomeric.fasta
produced this output:
Partitioning 1 .fasta file into 4 parts
Determining minimizer scheme & partition for sorang.telomeric
Estimate 155.896M 40-mers
Handling data in a single block
Segmentation fault
Clearly, the fault is mine. But it did take me a while to discover the error of my ways.
I wonder if some simple test could be done early on to validate that what the user claims is fasta is really fasta. The fact that the first character (1f) is outside the printable ascii range, is strong evidence. Looking at the code (but not completely understanding it), it looks like such a test could be added to the FK1 case in fast_nearest without (probably) adding much runtime overhead.