swarm icon indicating copy to clipboard operation
swarm copied to clipboard

maximal header length

Open frederic-mahe opened this issue 2 years ago • 0 comments

Header length used to be limited to 2,048 characters. There is no specific limit imposed on header lengths anymore. Yet, a long header can trigger a segmentation fault on my machine (SIGSEGV or signal 11 on Linux):

cd /tmp/

export LC_ALL=C

# make a fasta file (one entry) with a very long header
LENGTH=16777213  # success  35,687,712kB of RAM
LENGTH=16777214  # failure  Command terminated by signal 11  2,133,464 kB of RAM

FASTA="tmp_${LENGTH}.fas"
(
    printf ">"
    yes A | head -n "${LENGTH}" | tr -d "\n"
    printf "_1\nA\n"
) > "${FASTA}"

/usr/bin/time swarm --output /dev/null "${FASTA}"

rm "${FASTA}"

(Note the large amount of memory used: 35 GB)

The maximum length before failure is 16,777,213 + > + _1 = 16,777,216 = 2^24. Maybe we could add a check and a call to fatal() if header length is greater than that?

frederic-mahe avatar Feb 16 '22 17:02 frederic-mahe