XCI_Trimmer Add md5 hash function

The new function will output the md5 hash of a trimmed file that matches the fully padded rom image so it can be compared to the Scene and No-Intro databases.

Oct 01 '18 23:10 dgubitosi

Output looks like this now:

python3 ~/src/XCI_Trimmer/XCI_Trimmer.py --md5 "0027 - LEGO Worlds (World) (En,Ja,Fr,De,Es,It,Nl,Pt,Ru,Ko,Zh) [Trimmed].xci"
c790d56952275fb8945dbee7eefedb99  0027 - LEGO Worlds (World) (En,Ja,Fr,De,Es,It,Nl,Pt,Ru,Ko,Zh) [Trimmed].xci // trim size: 2604558848
5e3ffde24f0982a7c4e1549942eee861  0027 - LEGO Worlds (World) (En,Ja,Fr,De,Es,It,Nl,Pt,Ru,Ko,Zh) [Trimmed].xci // cart size: 3992977408

Oct 01 '18 23:10 dgubitosi

Sure, it may prove useful to some people who download pre-trimmed roms. I see you're using a block size of 1MiB. The original script uses a block size of 100MiB, so if you want you could increase it without worry to make the process faster.

Oct 01 '18 23:10 AnalogMan151

I did this because I was comparing pre-trimmed roms and didnt want to pad them in order to hash them.

The hash update() function performance actually drops with the larger block size. I tested a few values starting with the 100MB but finally settled on 1MB. I just did some online research which indicates the optimal block size to feed data into the hash update() function is only 65536. I will do some more tests to compare with that value.

What do you think of the format of the output? I can change that as well.

And sorry about the closed pull request. I should have just updated it.

** edit ** I did a bunch of tests just now. 65536 is optimal. This block size maintains a very linear 30 seconds per gigabyte. I tested 1GB, 2GB, 4GB, 5GB, 7GB, and 15GB files and it remains consistent.

1MB is very close but the difference starts to show with larger files. The 15GB file takes about 10 seconds longer.

100MB has the worst performance where the 15GB file takes 60 seconds longer than 65536.

Oct 02 '18 01:10 dgubitosi

This new code supports file globbing.

Oct 06 '18 12:10 dgubitosi