(PDF) With `chipper` and `partition_pdf`, `model_name='chipper'` returns an error
Describe the bug
With chipper and partition_pdf, model_name='chipper' returns an error.
To Reproduce
from unstructured.partition.pdf_image.pdf import partition_pdf
elements = partition_pdf(
filename="doc.pdf",
strategy="hi_res",
infer_table_structure=True,
model_name="chipper",
)
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
269 try:
--> 270 response.raise_for_status()
271 except HTTPError as e:
26 frames
HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/unstructuredio/chipper-fast-fine-tuning/resolve/main/preprocessor_config.json
The above exception was the direct cause of the following exception:
RepositoryNotFoundError Traceback (most recent call last)
RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-657c7885-34e538603dabb6a57784877b;41e38f65-e8fd-43a1-b550-f74f28f97c9c)
Repository Not Found for url: https://huggingface.co/unstructuredio/chipper-fast-fine-tuning/resolve/main/preprocessor_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
The above exception was the direct cause of the following exception:
OSError Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py](https://localhost:8080/#) in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
449 ) from e
450 except RepositoryNotFoundError as e:
--> 451 raise EnvironmentError(
452 f"{path_or_repo_id} is not a local folder and is not a valid model identifier "
453 "listed on '[https://huggingface.co/models'\nIf](https://huggingface.co/models'/nIf) this is a private repository, make sure to pass a token "
OSError: unstructuredio/chipper-fast-fine-tuning is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
Expected behavior The problem comes from the repository https://huggingface.co/unstructuredio/chipper-fast-fine-tuning which is not public (or it does not exist).
Environment Info
- unstructured 0.11.4
- unstructured-inference 0.7.18
Hi @piegu chipper model name now maps to a version that is not open to public but you can:
- still use the public version, v1, by specifying
model_name="chipperv1" - use the a different model likely the default "yolox"
Unfortunately, even with model_name="chipperv1", it does not work in my Colab notebook (my code works with the yolox model).
But anyway, I guess this old version of chipper is worse than the chipper version via api (and even maybe than the yolox model), so using chipper locally no longer makes sense in my opinion.
Too bad to no longer continue this chipper model in Open Source.
I also get the same error
Closing chipper is not longer support in the open source lib.