pybaseball icon indicating copy to clipboard operation
pybaseball copied to clipboard

download_lahman() failing

Open double-dose-larry opened this issue 1 year ago • 9 comments

Hi All,

I'm running pybaseball 2.2.7

I'm trying to run pybaseball.people() and getting the following stack trace:

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 download_lahman()

File ~/.local/lib/python3.10/site-packages/pybaseball/lahman.py:30, in download_lahman()
     28 def download_lahman():
     29     # download entire lahman db to present working directory
---> 30     z = get_lahman_zip()
     31     if z is not None:
     32         z.extractall(cache.config.cache_directory)

File ~/.local/lib/python3.10/site-packages/pybaseball/lahman.py:25, in get_lahman_zip()
     23 elif not _handle:
     24     s = requests.get(url, stream=True)
---> 25     _handle = ZipFile(BytesIO(s.content))
     26 return _handle

File /usr/lib/python3.10/zipfile.py:1269, in ZipFile.__init__(self, file, mode, compression, allowZip64, compresslevel, strict_timestamps)
   1267 try:
   1268     if mode == 'r':
-> 1269         self._RealGetContents()
   1270     elif mode in ('w', 'x'):
   1271         # set the modified flag so central directory gets written
   1272         # even if no files are added to the archive
   1273         self._didModify = True

File /usr/lib/python3.10/zipfile.py:1336, in ZipFile._RealGetContents(self)
   1334     raise BadZipFile("File is not a zip file")
   1335 if not endrec:
-> 1336     raise BadZipFile("File is not a zip file")
   1337 if self.debug > 1:
   1338     print(endrec)

BadZipFile: File is not a zip file

I dug around and saw that the data is attempt to be retrieved from here : https://github.com/chadwickbureau/baseballdatabank/archive/master.zip

That is leading to a dead link. Perhaps there was a change upstream.

double-dose-larry avatar Nov 20 '23 14:11 double-dose-larry

Similar issues - code will need update to handle new Chadwick register location and file structure (the people table has been split into multiple files).

JSCjr avatar Nov 22 '23 21:11 JSCjr

This is a separate issue from the Chadwick register (which I believe has been handled in PR #309 ). The issue looks like the chadwickbureau/baseballdatabank repository no longer exists, at least not publicly.

blue-shoes avatar Jan 26 '24 13:01 blue-shoes

Has this issue been fixed? Dug into the code and came to the same conclusion that finally got me to this page but I don't see any follow up/fix. I've pulled the code pretty recently so I was wondering if anyone had fixed or come up with the work around.

agpolivka avatar Apr 09 '24 23:04 agpolivka

Sean Lahman just posted an updated version of the database files at his own site, so this could presumably be fixed by pointing the code at those files instead.

JSCjr avatar Apr 11 '24 12:04 JSCjr

Linking to the files on his site looks fragile to me, since it's relying on naming convention in his personal Dropbox. The file is currently called lahman_1871-2023.csv, so one assumes this is not a static file name/path.

blue-shoes avatar Apr 11 '24 13:04 blue-shoes

I see the same error. And looks like the file location changed as @JSCjr mentioned.

StuffbyYuki avatar Apr 22 '24 20:04 StuffbyYuki

I'm also seeing this error. If this isn't an important functionality or a priority to maintain, might be a good idea to just remove it instead of keeping a broken function around

SushiInYourFace avatar Jun 26 '24 01:06 SushiInYourFace

note that there is a proposed fix here https://github.com/jldbc/pybaseball/pull/435

bdilday avatar Jun 26 '24 14:06 bdilday

Would love to see this fixed in the main repo.

efitton avatar Aug 01 '24 22:08 efitton