pybaseball
pybaseball copied to clipboard
download_lahman() failing
Hi All,
I'm running pybaseball 2.2.7
I'm trying to run pybaseball.people() and getting the following stack trace:
---------------------------------------------------------------------------
BadZipFile Traceback (most recent call last)
Cell In[12], line 1
----> 1 download_lahman()
File ~/.local/lib/python3.10/site-packages/pybaseball/lahman.py:30, in download_lahman()
28 def download_lahman():
29 # download entire lahman db to present working directory
---> 30 z = get_lahman_zip()
31 if z is not None:
32 z.extractall(cache.config.cache_directory)
File ~/.local/lib/python3.10/site-packages/pybaseball/lahman.py:25, in get_lahman_zip()
23 elif not _handle:
24 s = requests.get(url, stream=True)
---> 25 _handle = ZipFile(BytesIO(s.content))
26 return _handle
File /usr/lib/python3.10/zipfile.py:1269, in ZipFile.__init__(self, file, mode, compression, allowZip64, compresslevel, strict_timestamps)
1267 try:
1268 if mode == 'r':
-> 1269 self._RealGetContents()
1270 elif mode in ('w', 'x'):
1271 # set the modified flag so central directory gets written
1272 # even if no files are added to the archive
1273 self._didModify = True
File /usr/lib/python3.10/zipfile.py:1336, in ZipFile._RealGetContents(self)
1334 raise BadZipFile("File is not a zip file")
1335 if not endrec:
-> 1336 raise BadZipFile("File is not a zip file")
1337 if self.debug > 1:
1338 print(endrec)
BadZipFile: File is not a zip file
I dug around and saw that the data is attempt to be retrieved from here : https://github.com/chadwickbureau/baseballdatabank/archive/master.zip
That is leading to a dead link. Perhaps there was a change upstream.
Similar issues - code will need update to handle new Chadwick register location and file structure (the people table has been split into multiple files).
This is a separate issue from the Chadwick register (which I believe has been handled in PR #309 ). The issue looks like the chadwickbureau/baseballdatabank repository no longer exists, at least not publicly.
Has this issue been fixed? Dug into the code and came to the same conclusion that finally got me to this page but I don't see any follow up/fix. I've pulled the code pretty recently so I was wondering if anyone had fixed or come up with the work around.
Sean Lahman just posted an updated version of the database files at his own site, so this could presumably be fixed by pointing the code at those files instead.
Linking to the files on his site looks fragile to me, since it's relying on naming convention in his personal Dropbox. The file is currently called lahman_1871-2023.csv, so one assumes this is not a static file name/path.
I see the same error. And looks like the file location changed as @JSCjr mentioned.
I'm also seeing this error. If this isn't an important functionality or a priority to maintain, might be a good idea to just remove it instead of keeping a broken function around
note that there is a proposed fix here https://github.com/jldbc/pybaseball/pull/435
Would love to see this fixed in the main repo.