eXdupe icon indicating copy to clipboard operation
eXdupe copied to clipboard

eXdupe progressively gets worse at deduping

Open kskarlatos opened this issue 2 months ago • 15 comments

I run a long test with 748 daily backups of my homeassistant db and eXdupe gets progressively worse at deduping

zpaq_stats.txt exdupe_stats.txt

exdupe_redact.log

my script that did the job:

#!/bin/bash ARCHIVE="/home/cartman/mysqlbackups.zpaq" RAM="/mnt/zpaq/tmp" OUT="/home/cartman/mysqlbackups.exd" echo "starting at $(date)"

for i in $(seq 1 748); do echo "=== Version $i ===" rm -rf "$RAM" mkdir -p "$RAM"

# Extract only this version
zpaqfranz x "$ARCHIVE" -to "$RAM" -range $i
ls -lAR $RAM

# Create full (first) or diff (rest)
if [ $i -eq 1 ]; then
    /home/cartman/exdupe -g8 -x1 -k "$RAM" "$OUT"
else
    /home/cartman/exdupe -D -x1 -k "$RAM" "$OUT"
fi
du -s "$OUT"
echo "ending compression version $i at $(date)"

done rm -rf "$RAM" echo "ending all at $(date)"

kskarlatos avatar Nov 04 '25 15:11 kskarlatos

Thanks for the files! I think I know the cause - if you you have the time, I can make a test version one of the days that may fix it.

rrrlasse avatar Nov 04 '25 20:11 rrrlasse

Yes of course, I have everything setup so I can test it easily

Nov 4, 2025 22:46:29 Lasse Reinhold @.***>:

 [Image]*rrrlasse* left a comment (rrrlasse/eXdupe#19)[https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3487952792]

Thanks for the files! I think I know the cause - if you you have the time, I can make a test version one of the days that may fix it.

— Reply to this email directly, view it on GitHub[https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3487952792], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ARFFTAMQIQNKCGNQ2VVI4ET33EGCLAVCNFSM6AAAAACLDRIWQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIOBXHE2TENZZGI]. You are receiving this because you authored the thread. [Tracking image][https://github.com/notifications/beacon/ARFFTAPCGMD6EE4Z2XSDJTL33EGCLA5CNFSM6AAAAACLDRIWQ2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTWP4XXZQ.gif]

kskarlatos avatar Nov 04 '25 21:11 kskarlatos

Please try this binary: https://github.com/rrrlasse/eXdupe/releases/download/temp/exdupe_grow_test

(Source at https://github.com/rrrlasse/eXdupe/tree/f/fix_grow)

rrrlasse avatar Nov 05 '25 15:11 rrrlasse

exdupe_stats_fix.txt It surely got better, but is still 104GB vs 27GB for zpaqfranz. i will test with better compression options as well

kskarlatos avatar Nov 07 '25 10:11 kskarlatos

Thanks! Can you add the -k option to exdupe? Just for the next benchmark, you don't need to repeat this one.

edit: Sorry, what I meant is to provide the "exdupe_redact.log" that showed the console output.

rrrlasse avatar Nov 07 '25 10:11 rrrlasse

i am sorry, i made a mistake with gnu screen and i dont have that log. i can always repeat the test if you want

kskarlatos avatar Nov 07 '25 11:11 kskarlatos

It would be helpful - what I'm interested in is the "Hashtable fillratio" which -k tells. The compression level (what -x flag is used) isn't very important.

rrrlasse avatar Nov 07 '25 11:11 rrrlasse

Ok I am running it now with logging enabled

Nov 7, 2025 13:26:30 Lasse Reinhold @.***>:

 [Image]*rrrlasse* left a comment (rrrlasse/eXdupe#19)[https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3501997577]

It would be helpful - what I'm interested in is the "Hashtable fillratio" which -k tells. The compression level (what -x flag is used) isn't very important.

— Reply to this email directly, view it on GitHub[https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3501997577], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ARFFTAJIO5TOQ3RSQQW5G6D33R6WNAVCNFSM6AAAAACLDRIWQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMBRHE4TONJXG4]. You are receiving this because you authored the thread. [Tracking image][https://github.com/notifications/beacon/ARFFTAIRM77TIGBSKCSDRR333R6WNA5CNFSM6AAAAACLDRIWQ2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTWQXQ7AS.gif]

kskarlatos avatar Nov 07 '25 12:11 kskarlatos

exdupe_fix_clean.log

please tell me if i have removed too much (there were many whitespace lines etc)

kskarlatos avatar Nov 08 '25 11:11 kskarlatos

The log is perfect, thanks!

rrrlasse avatar Nov 08 '25 18:11 rrrlasse

If you still have the test setup, you could try https://github.com/rrrlasse/eXdupe/releases/download/temp/exdupe_grow_test_2

It should reduce the differential size alot.

rrrlasse avatar Nov 16 '25 08:11 rrrlasse

thanks, just started testing. will report when done

On 16/11/2025 10:07 π.μ., Lasse Reinhold wrote:

rrrlasse left a comment (rrrlasse/eXdupe#19) https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3538341030

If you still have the test setup, you could try https://github.com/rrrlasse/eXdupe/releases/download/temp/exdupe_grow_test_2

It should reduce the differential size alot.

— Reply to this email directly, view it on GitHub https://github.com/rrrlasse/eXdupe/issues/19#issuecomment-3538341030, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARFFTAO26CDCHNDDPUCKVG335AWC5AVCNFSM6AAAAACLDRIWQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKMZYGM2DCMBTGA. You are receiving this because you authored the thread.Message ID: @.***>

kskarlatos avatar Nov 16 '25 11:11 kskarlatos

exdupe_fix_clean2.log still the same size

exdupe_fix2_stats.log

kskarlatos avatar Nov 17 '25 13:11 kskarlatos

Ok - I think I will use the current solution and do future optimizations in a later version. Thanks for all your time :)

rrrlasse avatar Nov 17 '25 18:11 rrrlasse

Ok. whenever you want me to retest, i have kept my scripts :)

kskarlatos avatar Nov 18 '25 14:11 kskarlatos