XCI_Trimmer icon indicating copy to clipboard operation
XCI_Trimmer copied to clipboard

Add md5 hash function

Open dgubitosi opened this issue 7 years ago • 4 comments

The new function will output the md5 hash of a trimmed file that matches the fully padded rom image so it can be compared to the Scene and No-Intro databases.

dgubitosi avatar Oct 01 '18 23:10 dgubitosi

Output looks like this now:

python3 ~/src/XCI_Trimmer/XCI_Trimmer.py --md5 "0027 - LEGO Worlds (World) (En,Ja,Fr,De,Es,It,Nl,Pt,Ru,Ko,Zh) [Trimmed].xci"
c790d56952275fb8945dbee7eefedb99  0027 - LEGO Worlds (World) (En,Ja,Fr,De,Es,It,Nl,Pt,Ru,Ko,Zh) [Trimmed].xci // trim size: 2604558848
5e3ffde24f0982a7c4e1549942eee861  0027 - LEGO Worlds (World) (En,Ja,Fr,De,Es,It,Nl,Pt,Ru,Ko,Zh) [Trimmed].xci // cart size: 3992977408

dgubitosi avatar Oct 01 '18 23:10 dgubitosi

Sure, it may prove useful to some people who download pre-trimmed roms. I see you're using a block size of 1MiB. The original script uses a block size of 100MiB, so if you want you could increase it without worry to make the process faster.

AnalogMan151 avatar Oct 01 '18 23:10 AnalogMan151

I did this because I was comparing pre-trimmed roms and didnt want to pad them in order to hash them.

The hash update() function performance actually drops with the larger block size. I tested a few values starting with the 100MB but finally settled on 1MB. I just did some online research which indicates the optimal block size to feed data into the hash update() function is only 65536. I will do some more tests to compare with that value.

What do you think of the format of the output? I can change that as well.

And sorry about the closed pull request. I should have just updated it.

** edit ** I did a bunch of tests just now. 65536 is optimal. This block size maintains a very linear 30 seconds per gigabyte. I tested 1GB, 2GB, 4GB, 5GB, 7GB, and 15GB files and it remains consistent.

1MB is very close but the difference starts to show with larger files. The 15GB file takes about 10 seconds longer.

100MB has the worst performance where the 15GB file takes 60 seconds longer than 65536.

dgubitosi avatar Oct 02 '18 01:10 dgubitosi

This new code supports file globbing.

dgubitosi avatar Oct 06 '18 12:10 dgubitosi