agc icon indicating copy to clipboard operation
agc copied to clipboard

Thoughts about compressing unitigs?

Open rchikhi opened this issue 1 year ago • 2 comments

Hi Sebastian, Agnieszka, Heng,

AGC looks great. I wanted to see if it'd work also on badly-assembled sequences, e.g. unitigs, and didn't get good compression ratios. Would you say the approach fundamentally wouldn't work for unitigs, or did I miss some parameter tweaks?

I tried to compress 2 human samples unitigs (NA06986 & NA06991) using CHM13v2 as reference, resulting in AGC filesize of 3.6 GB, which is more than the concatenation of the raw gzipped unitigs (2x1.7GB). Cmdline: \time ~/tools/agc/agc create -t 10 chm13v2.0.oneline.fa NA06986.unitigs.fa.gz NA06991.unitigs.fa.gz > NA06986_NA06991.agc. Testing with parameter -s 200 didn't substantially change results.

thanks in advance for any feedback, Rayan

rchikhi avatar Nov 04 '22 09:11 rchikhi