eXdupe progressively gets worse at deduping
I run a long test with 748 daily backups of my homeassistant db and eXdupe gets progressively worse at deduping
zpaq_stats.txt exdupe_stats.txt
my script that did the job:
#!/bin/bash ARCHIVE="/home/cartman/mysqlbackups.zpaq" RAM="/mnt/zpaq/tmp" OUT="/home/cartman/mysqlbackups.exd" echo "starting at $(date)"
for i in $(seq 1 748); do echo "=== Version $i ===" rm -rf "$RAM" mkdir -p "$RAM"
# Extract only this version
zpaqfranz x "$ARCHIVE" -to "$RAM" -range $i
ls -lAR $RAM
# Create full (first) or diff (rest)
if [ $i -eq 1 ]; then
/home/cartman/exdupe -g8 -x1 -k "$RAM" "$OUT"
else
/home/cartman/exdupe -D -x1 -k "$RAM" "$OUT"
fi
du -s "$OUT"
echo "ending compression version $i at $(date)"
done rm -rf "$RAM" echo "ending all at $(date)"
Thanks for the files! I think I know the cause - if you you have the time, I can make a test version one of the days that may fix it.
Yes of course, I have everything setup so I can test it easily
Nov 4, 2025 22:46:29 Lasse Reinhold @.***>:
[Image]*rrrlasse* left a comment (rrrlasse/eXdupe#19)[https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3487952792]Thanks for the files! I think I know the cause - if you you have the time, I can make a test version one of the days that may fix it.
— Reply to this email directly, view it on GitHub[https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3487952792], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ARFFTAMQIQNKCGNQ2VVI4ET33EGCLAVCNFSM6AAAAACLDRIWQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOBXHE2TENZZGI]. You are receiving this because you authored the thread. [Tracking image][https://github.com/notifications/beacon/ARFFTAPCGMD6EE4Z2XSDJTL33EGCLA5CNFSM6AAAAACLDRIWQ2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTWP4XXZQ.gif]
Please try this binary: https://github.com/rrrlasse/eXdupe/releases/download/temp/exdupe_grow_test
(Source at https://github.com/rrrlasse/eXdupe/tree/f/fix_grow)
exdupe_stats_fix.txt It surely got better, but is still 104GB vs 27GB for zpaqfranz. i will test with better compression options as well
Thanks! Can you add the -k option to exdupe? Just for the next benchmark, you don't need to repeat this one.
edit: Sorry, what I meant is to provide the "exdupe_redact.log" that showed the console output.
i am sorry, i made a mistake with gnu screen and i dont have that log. i can always repeat the test if you want
It would be helpful - what I'm interested in is the "Hashtable fillratio" which -k tells. The compression level (what -x flag is used) isn't very important.
Ok I am running it now with logging enabled
Nov 7, 2025 13:26:30 Lasse Reinhold @.***>:
[Image]*rrrlasse* left a comment (rrrlasse/eXdupe#19)[https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3501997577]It would be helpful - what I'm interested in is the "Hashtable fillratio" which -k tells. The compression level (what -x flag is used) isn't very important.
— Reply to this email directly, view it on GitHub[https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3501997577], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ARFFTAJIO5TOQ3RSQQW5G6D33R6WNAVCNFSM6AAAAACLDRIWQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMBRHE4TONJXG4]. You are receiving this because you authored the thread. [Tracking image][https://github.com/notifications/beacon/ARFFTAIRM77TIGBSKCSDRR333R6WNA5CNFSM6AAAAACLDRIWQ2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTWQXQ7AS.gif]
please tell me if i have removed too much (there were many whitespace lines etc)
The log is perfect, thanks!
If you still have the test setup, you could try https://github.com/rrrlasse/eXdupe/releases/download/temp/exdupe_grow_test_2
It should reduce the differential size alot.
thanks, just started testing. will report when done
On 16/11/2025 10:07 π.μ., Lasse Reinhold wrote:
rrrlasse left a comment (rrrlasse/eXdupe#19) https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3538341030
If you still have the test setup, you could try https://github.com/rrrlasse/eXdupe/releases/download/temp/exdupe_grow_test_2
It should reduce the differential size alot.
— Reply to this email directly, view it on GitHub https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3538341030, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARFFTAO26CDCHNDDPUCKVG335AWC5AVCNFSM6AAAAACLDRIWQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMZYGM2DCMBTGA. You are receiving this because you authored the thread.Message ID: @.***>
Ok - I think I will use the current solution and do future optimizations in a later version. Thanks for all your time :)
Ok. whenever you want me to retest, i have kept my scripts :)