japsa icon indicating copy to clipboard operation
japsa copied to clipboard

XM data corruption

Open KirillKryukov opened this issue 5 years ago • 0 comments

XM compressor still has data corruption issue. Compressing some input and decompressing it back produces corrupted output. I.e., decompressed data is different from original file.

Test data size: 30,244 bytes Test data link: http://kirill.med.u-tokai.ac.jp/data/temp/xm-repro-4-input.zip

Commands to reproduce:

Compress: jsa.xm.compress --hashSize=11 --context=15 --limit=200 --threshold=0.15 --chance=20 --real=archive.xm original.fasta

Decompress: jsa.xm.compress --hashSize=11 --context=15 --limit=200 --threshold=0.15 --chance=20 --decode=archive.xm --output=decompressed.fasta

Compare: cmp original.fasta decompressed.fasta

Produces: original.fasta decompressed.fasta differ: byte 27512, line 274

The decompressed file has correct size, but corrupted sequence data. It was found during testing for Sequence Compression Benchmark.

Let me know if you need any additional information or help.

KirillKryukov avatar Jun 19 '19 06:06 KirillKryukov