nlp-architect icon indicating copy to clipboard operation
nlp-architect copied to clipboard

bug: zipfile.BadZipFile using pretrained BIST model

Open mastreips opened this issue 4 years ago • 4 comments

Describe the bug A clear and concise description of what the bug is. Model/procedure: what model or procedure were you running?

nlp_architect/models/absa/train/train.py produces zipfile.BadZipFile: File is not a zip file error when trying to download the pretrained model for SpacyBISTParser(). Updating spacy to 3.0 results in ImportError: cannot import name 'LEMMA_EXC' error as a result of a change from Spacy v2.1 to v2.2 to move the large lookup tables out of the main library. The lemmatizer data is now stored in the separate package spacy-lookups-data and the Lemmatizer is initialized with a Lookups object instead of the individual variables.since

To Reproduce Steps to reproduce the behavior:

  1. pip_packages = ['nlp-architect','spacy==2.1.8','numpy==1.19.5']

Expected behavior

**Environment setup: **

  • OS (Linux/Mac OS): Azure AML
  • Python version: 3.6.9
  • Backend:

Additional context

Log Output

You can now load the model via spacy.load('en') Using pre-trained BIST model. Downloading pre-trained BIST model... Unable to determine total file size. Downloading file to: /root/nlp-architect/cache/bist-pretrained/bist-pretrained.zip

0MB [00:00, ?MB/s] 1MB [00:00, 579.96MB/s] Download Complete Unzipping...

[2021-04-05T14:57:17.529886] The experiment failed. Finalizing run... 2021-04-05 14:57:17,535 INFO Exiting context: TrackUserError 2021-04-05 14:57:17,536 INFO Exiting context: RunHistory Cleaning up all outstanding Run operations, waiting 900.0 seconds 1 items cleaning up... Cleanup took 0.07420921325683594 seconds 2021-04-05 14:57:30,901 INFO Exiting context: ProjectPythonPath Traceback (most recent call last): File "train.py", line 46, in max_iter=args.max_iter) File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/site-packages/nlp_architect/models/absa/train/train.py", line 49, in init self.parser = SpacyBISTParser() File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/site-packages/nlp_architect/pipelines/spacy_bist.py", line 46, in init _download_pretrained_model() File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/site-packages/nlp_architect/pipelines/spacy_bist.py", line 170, in _download_pretrained_model uncompress_file(zip_path, outpath=str(SpacyBISTParser.dir)) File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/site-packages/nlp_architect/utils/io.py", line 85, in uncompress_file with zipfile.ZipFile(filepath) as z: File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/zipfile.py", line 1108, in init self._RealGetContents() File "/azureml-envs/azureml_d664de2764d55f1b5c7b6f4fc0a2fd6b/lib/python3.6/zipfile.py", line 1175, in _RealGetContents raise BadZipFile("File is not a zip file") zipfile.BadZipFile: File is not a zip file

mastreips avatar Apr 05 '21 15:04 mastreips

I was able to fix the issue by changing the code in io.py:

with open(destfile, "wb") as f: for data in tqdm(req.iter_content(chunksz), total=nchunks, unit="MB", file=sys.stdout): f.write(data) print("Download Complete")

to:

    url = "https://d2zs9tzlek599f.cloudfront.net/models/dep_parse/bist-pretrained.zip"
    remote = urllib.request.urlopen(url)
    data = remote.read()
    remote.close()
    local = open(destfile, 'wb')
    local.write(data)
    local.close()

mastreips avatar Apr 05 '21 20:04 mastreips

Hi @mastreips According to the stack trace in your issue, you are using an old version of the code (updated 6 months ago according to git blame). I have tested SpacyBist downloading and ABSA execution end-to-end and everything works fine.

danielkorat avatar Apr 13 '21 07:04 danielkorat

@danielkorat I am running into the same issue and I believe the issue is an outdated nlp-architect package.

Ran into the same issue with pip install nlp-architect

Resolved with build from cloned repo.

Can you validate that you ran you test with the package version?

vkurpad avatar Apr 18 '21 04:04 vkurpad

Hi @vkurpad, The pip package URLs might be outdated. @peteriz can you confirm? I installed from source, see the installation instructions here.

danielkorat avatar Apr 18 '21 07:04 danielkorat