meryl icon indicating copy to clipboard operation
meryl copied to clipboard

Homopolymer compression is not applied if the first read file is empty

Open maickrau opened this issue 2 years ago • 1 comments

Running count compress with multiple read files and an empty file as the first file does not apply homopolymer compression. The following command creates an index without homopolymer compression:

meryl count compress k=21 threads=4 memory=32g empty.fa reads.fa output kmers_withempty

But putting the empty file as the not first file will correctly create a homopolymer compressed index:

meryl count compress k=21 threads=4 memory=32g reads.fa empty.fa output kmers_withempty2

meryl print shows the first file is not homopolymer compressed but the second is:

$ meryl print kmers_withempty/ | head

Found 1 command tree.

PROCESSING TREE #1 using 1 thread.
  opLessThan
    kmers_withempty/
    print to (stdout)
AAAAAAAAAAAAAAAAATAAG   1
AAAAAAAAAAAAAAAACTACA   1
AAAAAAAAAAAAAAAATAAGG   1
AAAAAAAAAAAAAAACAATAC   1
AAAAAAAAAAAAAAACTACAG   1
AAAAAAAAAAAAAAATAAGGA   1
AAAAAAAAAAAAAACAATACT   1
AAAAAAAAAAAAAACTACAGA   1
AAAAAAAAAAAAAATAAGGAG   1
AAAAAAAAAAAAAAGTACTTT   1

$ meryl print kmers_withempty2 | head

Found 1 command tree.

PROCESSING TREE #1 using 1 thread.
  opLessThan
    kmers_withempty2/
    print to (stdout)
ACACACACACACACACTACTA   1
ACACACACACACACTACTACT   1
ACACACACACACATCATATAC   1
ACACACACACACTACAGACAT   1
ACACACACACACTACAGATCA   1
ACACACACACACTACTACTAC   2
ACACACACACATCATATACAG   1
ACACACACACTACAGACATCA   1
ACACACACACTACAGATCATC   1
ACACACACACTACTACTACTA   4

$ meryl --version
meryl snapshot v1.4-development +29 changes (r969 97d5923dd69ebc3efed67fc466c21ed8c5e6670b)

maickrau avatar Feb 20 '23 10:02 maickrau

Thanks, Mikko. It's not just an empty first file that causes trouble. The 'compress' flag is reset after EACH file. The workaround is simple but annoying: add 'compress' before each input file.

I remember debating if this flag should be reset or not. I'm a little embarrassed I left it in.

brianwalenz avatar Feb 21 '23 05:02 brianwalenz