pybaseball icon indicating copy to clipboard operation
pybaseball copied to clipboard

Fix the Lahman Database Scraping

Open jmaslek opened this issue 1 year ago • 5 comments

This PR redirects the Lahman database from a 404 github link to the dropbox site that is found on baseball1.com.

In order to extract the data, py7zr was added to the requirements.

jmaslek avatar Jun 18 '24 15:06 jmaslek

couldl this be addressed by moving the link to soemthing in https://github.com/chadwickbureau/retrosheet?

schorrm avatar Jun 19 '24 02:06 schorrm

couldl this be addressed by moving the link to soemthing in https://github.com/chadwickbureau/retrosheet?

Looks like there may be some overlapping files, but nothing that mimics Lahman's db.

jmaslek avatar Jun 19 '24 03:06 jmaslek

what do y'all think about extracting the data and posting it to a repo in github? maybe even embedded in pybaseball?

bdilday avatar Jun 19 '24 22:06 bdilday

what do y'all think about extracting the data and posting it to a repo in github? maybe even embedded in pybaseball?

I have no issue with that (I assume theres no licensing issues with that). I'm happy to add a folder here or put them on my own github.

jmaslek avatar Jun 19 '24 23:06 jmaslek

As of today, the Lahman DB has been donated to the Society for American Baseball Research (SABR). SABR is now responsible for hosting and maintaining the dataset.

blue-shoes avatar Oct 08 '24 20:10 blue-shoes