FASTK icon indicating copy to clipboard operation
FASTK copied to clipboard

Misnaming a gzipped input causes a segfault

Open rsharris opened this issue 2 years ago • 0 comments

Granted, this is a stupid mistake on my part. I created a file named sorang.telomeric.fasta but in fact it was a gzipped file. The first two byes of the file are the gzip magic number: 1f 8b.

Running FastK -v -k40 -t1 data/sorang.telomeric.fasta produced this output:

Partitioning 1 .fasta file into 4 parts
Determining minimizer scheme & partition for sorang.telomeric
  Estimate 155.896M 40-mers
  Handling data in a single block
Segmentation fault

Clearly, the fault is mine. But it did take me a while to discover the error of my ways.

I wonder if some simple test could be done early on to validate that what the user claims is fasta is really fasta. The fact that the first character (1f) is outside the printable ascii range, is strong evidence. Looking at the code (but not completely understanding it), it looks like such a test could be added to the FK1 case in fast_nearest without (probably) adding much runtime overhead.

rsharris avatar Jun 27 '22 19:06 rsharris