bug: zipfile.BadZipFile using pretrained BIST model
Describe the bug A clear and concise description of what the bug is. Model/procedure: what model or procedure were you running?
nlp_architect/models/absa/train/train.py produces zipfile.BadZipFile: File is not a zip file error when trying to download the pretrained model for SpacyBISTParser(). Updating spacy to 3.0 results in ImportError: cannot import name 'LEMMA_EXC' error as a result of a change from Spacy v2.1 to v2.2 to move the large lookup tables out of the main library. The lemmatizer data is now stored in the separate package spacy-lookups-data and the Lemmatizer is initialized with a Lookups object instead of the individual variables.since
To Reproduce Steps to reproduce the behavior:
- pip_packages = ['nlp-architect','spacy==2.1.8','numpy==1.19.5']
Expected behavior
**Environment setup: **
- OS (Linux/Mac OS): Azure AML
- Python version: 3.6.9
- Backend:
Additional context
Log Output
You can now load the model via spacy.load('en') Using pre-trained BIST model. Downloading pre-trained BIST model... Unable to determine total file size. Downloading file to: /root/nlp-architect/cache/bist-pretrained/bist-pretrained.zip
0MB [00:00, ?MB/s] 1MB [00:00, 579.96MB/s] Download Complete Unzipping...
[2021-04-05T14:57:17.529886] The experiment failed. Finalizing run...
2021-04-05 14:57:17,535 INFO Exiting context: TrackUserError
2021-04-05 14:57:17,536 INFO Exiting context: RunHistory
Cleaning up all outstanding Run operations, waiting 900.0 seconds
1 items cleaning up...
Cleanup took 0.07420921325683594 seconds
2021-04-05 14:57:30,901 INFO Exiting context: ProjectPythonPath
Traceback (most recent call last):
File "train.py", line 46, in
I was able to fix the issue by changing the code in io.py:
with open(destfile, "wb") as f: for data in tqdm(req.iter_content(chunksz), total=nchunks, unit="MB", file=sys.stdout): f.write(data) print("Download Complete")
to:
url = "https://d2zs9tzlek599f.cloudfront.net/models/dep_parse/bist-pretrained.zip"
remote = urllib.request.urlopen(url)
data = remote.read()
remote.close()
local = open(destfile, 'wb')
local.write(data)
local.close()
Hi @mastreips According to the stack trace in your issue, you are using an old version of the code (updated 6 months ago according to git blame). I have tested SpacyBist downloading and ABSA execution end-to-end and everything works fine.
@danielkorat I am running into the same issue and I believe the issue is an outdated nlp-architect package.
Ran into the same issue with pip install nlp-architect
Resolved with build from cloned repo.
Can you validate that you ran you test with the package version?
Hi @vkurpad, The pip package URLs might be outdated. @peteriz can you confirm? I installed from source, see the installation instructions here.