scraper
scraper copied to clipboard
Update hashes for all systems
Need to update the dataset with all the new games in the GDB.
Finished GG, GB, GBC, GBA
I found more GG games (24) I somehow missed from the thegamesdb - I've now added them to the website and this spreadsheet has the GameDB IDs for them
https://dl.dropboxusercontent.com/u/48342677/NEW%20GameGear%20IDs.csv
Thanks. I added them all to the dataset and found some duplicate issues in the dataset and corrected those.
Thanks !!
Updated NES.
finished SNES
Hi @sselph , a few Genesis games that are currently missing (based on the ROMs I scraped)
https://dl.dropboxusercontent.com/u/48342677/Missing%20genesis.csv
edit
Also, If I can be a little bit picky, my Genesis 'Earthworm Jim (USA)' ROM links to ID 2894 , which is the Mega Drive version (they're identical except for the image), rather than the Genesis ID of 4353.
My missing Master System ROMs
https://dl.dropboxusercontent.com/u/48342677/Missing%20MasterSystem.csv
Thanks, I've added your entries.
Hey. These two games result in duplicate entries (ID 611) - one should be ID 611 (vs. Kingpin), one should be 2620 (animated series).
Hi @sselph , here are all of the GBA games I had that were missing an entry in TGDB. There are now IDs for all of them:
https://db.tt/tEW7B4X8
Can you let me know when your database has been updated? :)
Sorry for the delay, I added your csv.
I didn't realize there is a thread for that before I made mine. Support for Nintendo DS would be really appreciated. I have little knowledge in coding, but if it's a matter of time, I have time to spare. Thanks.
Hi, I added this game to TGDB: http://thegamesdb.net/search/?string=Super+Soukoban&function=Search but the scraper can't find it.
Of course, the game is in No-Intro set: http://datomatic.no-intro.org/?page=show_record&s=49&n=2865
@Pacolo I added this game.
@sselph New GBA games to add to your list :) https://drive.google.com/file/d/0ByWwZdQX1FQmVEE2RGJsaUdxbTQ/view?usp=sharing
Could you let me know when they're added so I can re-scrape? Many thanks!
@robertybob Added.
Hi @sselph , just 5 Turbografx games missing from your scraper
https://www.dropbox.com/s/13lf004v2fiwiv5/Turbografx.csv?dl=0
Done
Hi again @sselph . Thank you for adding those TG16 games. I've now started importing PS1 games into my Pi. I just scraped 63 games, 35 downloaded images and details entered into the gamelist, yet running your reporting tool gives a figure of just over 50. Not sure what's going on there.
Either way, here's my hashes and TGDB IDs for inclusion onto your database :)
https://www.dropbox.com/s/3pavet9jjc4vyrq/PS1%20Missing.csv?dl=0
Thanks again! :+1:
The script checks the cue then the bins so it is possible the cues were correct since they are just a text file but the bin was slightly different. The reporter tool doesn't look at cue files so it isn't printing those. I can add these bin's in.
Ah ok, that's where I'm going wrong- it's not finding the games because I haven't got any .cue files set up yet (?)
Also, I've got a few .img files, these aren't supported by your scraper yet are they IIRC?
So basically without .bin support within the reporting tool, there's no way for me to gain the hashes of the games definitely not being picked up by the Scraper?
Do you take unlicensed games?
6cf18228cfb66d48b3642069979d4a5103cb8528,26500,7,Somari
This scraper uses data from thegamesdb.net - if a game is on that site then this scraper should pick it up.
I should note, however, that unlicensed games and hacks are frowned upon on TGDB.net ..if you add that game it may well be deleted.
This scraper uses data from thegamesdb.net - if a game is on that site then this scraper should pick it up.
The game I added already has an entry on thegamesdb: http://thegamesdb.net/game/26500
It looks like quite a few Unlicensed games already exist in the csv:
~/.sselph-scraper$ grep \(Unl\) hash.csv | wc -l
601
I should note, however, that unlicensed games and hacks are frowned upon on TGDB.net ..if you add that game it may well be deleted.
I didn't add, someone else added it.
I don't care is it is unlicensed but since the system is NES the hash from just the regular shasum probably isn't correct. Do you mind using a version I just created that uses the same hashing algorithms that the scraper uses?
https://storage.googleapis.com/stevenselph.appspot.com/shasum_linux_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_linux_amd64.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_rpi.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_rpi2.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_windows_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_windows_amd64.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_mac_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_mac_amd64.zip
would be used similar to shasum
shasum file.nes
Do you mind using a version I just created that uses the same hashing algorithms that the scraper uses?
Ah, I didn't realise that you used your own hash. Sorry about that.
How's this:
bb618e17cd21eaa0185de3a3bf0028dcbba6a0c3,26500,7,Somari (Unl)
Thanks, I've added it. I use a sha1 hash but I parse the header of .nes files to find the rom data.
Three more:
0c4c9aa8002aece613bf74c2bcee12da79fe5ddc,26490,7,Mortal Kombat II Special (Unl)
9d8366640849c3b22aa0d97770b58572a04ce442,26482,7,Mortal Kombat II (Unl)
232e54da6faf7bac4a29769fa0379570d83ec32e,26491,7,Mortal Kombat III (Unl)
Sorry for the delay, done.
snes http://pastebin.com/dWjyD0NT
@Alexholly Out of interest, are these games on thegamesdb.net?
Yes the id's are from thegamesdb.net Is there something wrong?
Hi @sselph :) Got some missing PSX games:
FIFA Football 2003 (E) : TGDB ID = 23566 FIFA Football 2004 (Europe) : TGDB ID = 11705 PaRappa the Rapper - The Hip Hop Hero (E) : TGDB ID = 779
@robertybob @AlexHolly Wow I'm extremely sorry this took so long. I must have have read the email on my phone then forgot to actually log in later to work on this. The hashes have been added.
@sselph no worries! hope you didn't mind me starting the wiki by the way :)
Here are some missing Lynx games. This is my first contribution, so I apologize if it isn't correct: https://www.dropbox.com/s/h4o7ge2gifmtl10/file?dl=0
Thanks @MetalManTN I prefer it is you do all the work for me :) and open the file up as a csv in your favorite spreadsheet program and add a column for each game and give me thegamesdb ID. I went ahead and looked these up and seems like someone has added them to thegamesdb so they are all there now.
@robertybob I noticed that. Thanks!
@sselph My apologies. I would prefer that I do all the work for you too (I am well aware that you are incredibly busy with all the requests), I just didn't realize I needed to do anything with that output file before posting it. Any and all future contributions from me will be in the correct format with IDs. Thank you for all that you've done for us. I love the scraper.
Got one for NES
f663d004bea0fe0518fb8b2e3a9070e1ef1d39f4,27281,7,Space Invaders (Japan)
This one is already in the list, but it has the wrong game ID
ce7580059e8b41cb4a1e734c9b35ce3774bf777a,9245,22,Combat - Tank-Plus
It should be 4887
@Jcw87 sorry for the delay but I made the changes. Also sorry about the messed up Combat match. I must have had a bad copy/paste or something.
This is my first submission for Hash to TheGamesDB for your scraper. Please let me know if I need to change my CSV in any way to make it work better for you. Here are all the N64 hashes and their associated TGDB ID's:
https://drive.google.com/open?id=1E8anjd2FFlNsRGIAauOhSrKmUaEPj6m-TEoN2JL3hDM
Here is a CSV I made for all NES Hashes which are not currently in the scraper:
https://drive.google.com/open?id=1SaZXgRIdtKK7dRB1zbaQVtsVsSblOvT5E1sHsyBAbx8
Here is a CSV I made for all SNES Hashes which are not currently in the scraper:
https://drive.google.com/open?id=1GBMKFuKTHmhfoJjz5dJtB4o872FmiHoPJsiZz79vLrE
Here is a CSV I made for all Sega Master System Hashes which are not currently in the scraper:
https://drive.google.com/open?id=1RI6igRam-nR1emets0wsDnwqGdiP_rHLY0ynUyscUxo
Here is a CSV I made for all Sega Game Gear Hashes which are not currently in the scraper:
https://drive.google.com/open?id=1zjycsS4WRReu7db7AKUYnYONJP7AabdmNAExVgCBmac
Here is a CSV I made for all Sega Genesis Hashes which are not currently in the scraper:
https://drive.google.com/open?id=1nQbn4QdVX47hMurFUGde7-9rqE6iBG0zmY7b_d2zBQ4
Here is a CSV I made for all Nintendo Game Boy Hashes which are not currently in the scraper:
https://drive.google.com/open?id=1TE_j4a9TcV1HNrcdvFeOtf7iNrNds71GMgtv_g5PshY
Here is a CSV I made for all Sega 32X Hashes which are not currently in the scraper:
https://drive.google.com/open?id=12NmsgYz1w6b4upG6vbrlztqMylSMKCWx9xA4ny5sMKs
Notes: Can you delete all of the other Sega 32X CD's associated under the system 33 designation? Most of them are linked to the Sega CD TGDB ID and not to the Sega 32X CD TGBD ID. Supreme Warrior and Corpse Killer specifically. Both are linked to Sega CD.
We also should find a way to distinguish Disc 1, Disc 2, Disc 3, etc. for all discs after scraping. It would be nice to be able to tell each disc apart through the front end.
@stevetb re: Disc titles, maybe this flag is the answer?
-use_filename
If true, use the filename minus the extension as the game title in xml.
@sselph I would like to start pushing a lot of the hashes to you do not exist currently. I have put a lot on here already over this past week but would like to pay back to the community and scrape even more. Please let me know what the fastest method is to get these to you so we can quickly add them to the scraper database. This will immediately help users. I will work on my end to do most of the work for you.
I would also like to have quite a few more platforms added to your scraper. If you can add these platforms I will link the unknown hashes to the TGDB database.
Here are the Platforms of interest:
- NeoGeo
- NeoGeo Pocket
- NeoGeo Pocket Color
- Intellivision
- Colecovision
- Virtual Boy
- Sega SG-1000
Please contact me so we can work out a way to add more roms & platforms with me doing as much heavy lifting as possible.
@stevetb Dreamcast is already supported (?) https://github.com/sselph/scraper/issues/34
@robertybob Sorry I meant, Sega SG-1000. I have corrected above in my comment.
@stevetb Thanks for all the hash data. I'll start taking a look at these. I need to investigate the systems you linked, ie what are the extensions and is there any special encoding, etc.
Added in all your csv files except the 32X. I want to spend some time sorting out the 32X CD from Sega CD and it is getting late.
@sselph I will be out of town for a week. When I return I will start the process of scraping all the roms & variations of roms I can find. Redump will be a guide as well to ensure I capture everything. Thank you again for your help and your excellent scraper!!!!!
Here is a CSV I made for all Sega SG-1000 Hashes which are not currently in the scraper:
https://drive.google.com/open?id=1h2ljwPz7F7_ucF8TQ5KeLeoe3nMHab_Iqj5Q1hLS0j8
Note: Only No-Intro Romset Hashes
Here is a CSV I made for all SNES Hashes from the No-Intro Romset which are not currently in the scraper:
https://drive.google.com/open?id=1_CfJxrSNASApzhXwUQ2QTQBu6KgFS7ohlJCCZngiX9M
Note: I could not find the hashes for the two below games. They are incorrectly matched against TGDB
- NFL Quarterback Club (USA)
- Star Fox - Super Weekend (USA)
Updated SNES.csv file with the two missing hashes:
https://drive.google.com/open?id=1bAlg3rX94vdgqWdY3XPh3dMZMQ2sL8ZlADZnb903ono
@sselph
Here is the complete No-Intro Romset for Neo Geo Pocket:
https://drive.google.com/open?id=1PVq4BAkwDM_Qet-ErjOt_oQKJn8_ymFvouwY6Csz7PY
Here are some TurboGrafx CD updates. These are based on CUE files for the compressed audio (ogg) sets out there. There are also a rom or two from SNES and Genesis updated in there too.
https://www.dropbox.com/s/a5vpcl9uqx8a62r/HASH%20Updates%205-19-2016.csv?dl=0
Here are some more ScummVM games. Mostly Sierra games plus a couple others. The hashes are all identical, but the filenames are there.
https://www.dropbox.com/s/pkreabdakw1q1cs/Missing%20ScummVM.csv?dl=0
The US version of the SNES Final Fantasy III improperly grabs the data for the Japanese version (Final Fantasy VI).
MD5: 544311e104805e926083acf29ec664da
SHA-1: 057ada1c641e3e0b3ca34e6e4f4eb1b05a87143a
final fantasy iii (usa) (rev a).sfc
http://thegamesdb.net/game/83/
@sselph Please update the following when you have time. All SNES for No-Intro should be complete once these last 11 are added.
https://drive.google.com/open?id=1bAlg3rX94vdgqWdY3XPh3dMZMQ2sL8ZlADZnb903ono
Note: Star Fox - Super Weekend (USA) and NFL Quarterback Club (USA) are both updates to incorrect TGBD pointers
Lot's of missing data for various handheld platforms. Lots of dumb kid games, but it will make things more complete. It also has NeoGeo Pocket Color which I think someone else recently sent data for, but I have the US set in there too.
https://www.dropbox.com/s/0efcbze21afn3a8/handheld.csv?dl=0
@BenWlson @stevetb I think I have added all the information y'all provided.
@sselph so close, hahaha. I still need these two titles updated. Here are the correct hashes and TGDB links which both need updated on your hash.csv
https://drive.google.com/open?id=1hl9CBD1nmbGknm57E9KMGX6YIeqfNPymtFbC_NhMjFU
Something else must be going on Those hashes appear in the csv.
cat hash.csv.gz | zgrep 078c3f6ae65c243fe3e330c699a75df536a5c20a
078c3f6ae65c243fe3e330c699a75df536a5c20a,5707,6,NFL Quarterback Club (USA)
cat hash.csv.gz | zgrep 2ba5f446dcb56d1164e28b337ae7b4833278b6d9
2ba5f446dcb56d1164e28b337ae7b4833278b6d9,26300,6,Star Fox - Super Weekend (USA)
@sselph I cleared out my downloaded images and gamelists and ran it again. This fixed both of the above. Thank you and sorry to pester you on these......especially when it is my error, doh!
Thank you sselph!
https://www.dropbox.com/s/1fz78rs78estn2l/More%20Hashes%205-23-16.csv?dl=0
Here's a few that seem to be missed from what I submitted earlier.
@BenWlson Sorry about that. It should be fixed now.
https://www.dropbox.com/s/w72wbd0ykcq06gq/atari2600.csv?dl=0
Some Atari 2600 updates.
@BenWlson We need to clean up the rom names for that Atari 2600 set. Basically take out anything in between brackets and remove the file extension. I'd also remove the Extra and Error columns. For the TGDB column, I would remove the http address and instead just put down the ID (Example = 33333). Lastly, you would make another column for Platform so that sselph does not have to look that up and add it in.
Those things are easy enough for me to fix and If the name isn't a no-intro name I just leave that field blank and it picks up the name from thegamesdb.
Forgot to say these have been uploaded.
Just finished the No-Intro set on my end for Sega Genesis and Sega Master System
- Sega Genesis is perfect! Good job!
- Sega Master System I just need to add one Australian ROM Here is the .CSV
https://drive.google.com/open?id=1GZeScDgMLsijU7mRrFz029cVHDKwHfh6I8t4tBOaOrA
Thank you and great job @sselph !
Done
The US version Alisia Dragoon Sega Genesis rom currently pulls the wrong version:
Alisia Dragoon (USA).md
SHA-1: 15B6244385DB4B449B7C189C13DB7B9C1427C688
should be:
http://thegamesdb.net/game/4243
Done
@sselph Two titles (Proto) which I need to complete USA N64 No-Intro Set.
https://drive.google.com/open?id=18F35J2RV1iIZZT4RnwJcDCK7L3UgtD7rjxDs47B_M58
You've done a great job with N64, very complete. Thank you!
Done
When I scrape my Genesis USA romset, the majority of the roms end up with US Genesis images, but a small handful come back with MegaDrive images.
Here is the list of the wrong region Genesis roms with hashes and links to the US gamesdb version: https://www.dropbox.com/s/9xikmeu6ar06xvc/Wrong%20Region.csv?dl=0
There are a couple roms that appear to be USA/Europe or World versions, so I guess they could technically go either way. Whether or not to change them would be up to you, @sselph.
Here are some updated for Sega CD. They are based on CUE files for a compressed audio (OGG) romset that's out there. I have most of the US romset with the exception of the FMV and shooter games. I may do those later.
https://www.dropbox.com/s/c0voyk8vcaan2h9/segacd.csv?dl=0
@sselph Here are the last No-Intro Roms I needed to complete NES.
https://drive.google.com/open?id=1jYvG6mYSZBIIaqC2Ibv0y4YKaDNGHK3WMeLXn8GCTDU
Thank you again!
@BenWlson Thanks, updated the MD entries. I have chosen to use the US version for USA/Europe since USA accounts for over half of the worldwide sales of genesis/megadrive platforms. Also added the sega CD entries.
@stevetb Added/Updated the NES entries.
@sselph Thank you!
007 - The World Is Not Enough (USA, Europe).gbc
for GameBoy Color incorrectly pulls from the N64 007.
Here's the correct one:
SHA-1: 43552FD2F4464F42D9A5AA7CDF79A012C2BD9DC4
http://thegamesdb.net/game/20734
Here is should be the last titles for the Sega Game Gear for No-Intro.
https://drive.google.com/open?id=1acEedM2Mx_ePLLvaYaefGSnebdfKYjljNVXR_Ubdq_s
@BenWlson I couldn't find that hash in my DB and the 007 for the GBC is already linked to that ID. The only things that link to http://thegamesdb.net/game/238 are the 2 versions of 007 for the N64. Maybe this was fixed recently and the image that was downloaded is stale?
@stevetb Thanks! Added the missing GG.
Somehow I gave the wrong hash for the improper GBC 007 game. I just double-checked it, and it's getting an N64 image.
71D84B4065CF2C36B5E337BC2C56D8384418529F
http://thegamesdb.net/game/20734
another missing hash:
EBF766B37CE893579E76CC9367711DF8479269CD
http://thegamesdb.net/game/25769/
@BenWlson I'm not seeing that hash either. It is also upper case so seems to come from an external tool. GBC should be a standard shasum but could you try using my shasum utility https://storage.googleapis.com/stevenselph.appspot.com/shasum_linux_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_linux_amd64.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_rpi.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_rpi2.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_windows_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_windows_amd64.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_mac_386.zip https://storage.googleapis.com/stevenselph.appspot.com/shasum_mac_amd64.zip
basically use it like
shasum filename.gbc
Game Boy: Probotector (EU) should link to 26550 Probotector 2 (EU) should link to 26552 Currently they pull the info from the US versions (Operation C and Contra: The Alien Wars)
thanks @robertybob. Sorry for the delay but fixed these.
@sselph Tonight I went through the USA and European Romset for GameBoy. I found 10 titles which were not previosly added. Here are all of the missing one's I found:
https://drive.google.com/open?id=1iDPBZoaJfruEIyhF3WXceH6NhiDGoJjiYrWlQDEEfcc
SNES:
Game Hash GDB ID Apocalypse II.smc - 95640a8ecff7a5380c71fd2a4915f22341870769 - 37466 Congo - The Movie - The Secret of Zinj.smc 32efc8993c8add3c6647fa87712c634b71158787 - 28570 NCAA Football.smc - f8786b52ebbfd72a2fff236a5d1cf09262e7d048 - 5702 Network Q Rally.smc - 94169aaeb1557a25b1e03a181a28b4b29ece6f5f - 37467
Alpha Beam with Ernie.bin
for Atari 2600 pulls incorrect metadata from PS3
Here the correction: Game - Hash - GDB ID Alpha Beam with Ernie.bin - a1f660827ce291f19719a5672f2c5d277d903b03 - 31122
A bunch of missing entries:
Game,Error,Hash,Extra,thegamesdbID
Bust-A-Move.smc,hash not found,0a34f76c5684bfc6a867476546dad55ddfef5d76,"",2040
Final Fantasy V (Japan).smc,hash not found,a9a77b07cd6c1b98a0186e676c0e3724ba61a94b,"",1762
Final Fantasy VI (Japan) [En by RPGOne v1.2b] [All Bug Fixes].sfc,hash not found,2773801e44947f78e444705aaa9d301e2be6ba36,"",34358
Chrono Trigger (USA) [Hack by Kajar Laboratories Demo 2] (~Chrono Trigger - Crimson Echos).sfc,hash not found,e9a3c2bfa44f864a386ef1dd85cfff909a95181b,"",1255
Secret of Evermore (USA) [Hack by FuSoYa v1.02] (2 Player Edition).sfc,hash not found,69675970540ac9b21a38975b010df5abeba510e3,"",1311
Super Famicom Wars (Japan) (NP).sfc,hash not found,24279ca4b598f4caa0cf4d7fa0a423f9e51bb6f7,hash found but no GDB ID,26347
Seiken Densetsu 3 (Japan) [En by LNF+Neill Corlett+SoM2Freak v1.01].sfc,hash not found,4a8d8bd431959d42e2ba4d953bfd11d042216a34,"",5827