unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

(PDF) With `chipper` and `partition_pdf`, `model_name='chipper'` returns an error

Open piegu opened this issue 2 years ago • 3 comments

Describe the bug With chipper and partition_pdf, model_name='chipper' returns an error.

To Reproduce

from unstructured.partition.pdf_image.pdf import partition_pdf

elements = partition_pdf(
        filename="doc.pdf",
        strategy="hi_res",
        infer_table_structure=True,
        model_name="chipper",
        )

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py](https://localhost:8080/#) in hf_raise_for_status(response, endpoint_name)
    269     try:
--> 270         response.raise_for_status()
    271     except HTTPError as e:

26 frames
HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/unstructuredio/chipper-fast-fine-tuning/resolve/main/preprocessor_config.json

The above exception was the direct cause of the following exception:

RepositoryNotFoundError                   Traceback (most recent call last)
RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-657c7885-34e538603dabb6a57784877b;41e38f65-e8fd-43a1-b550-f74f28f97c9c)

Repository Not Found for url: https://huggingface.co/unstructuredio/chipper-fast-fine-tuning/resolve/main/preprocessor_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py](https://localhost:8080/#) in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    449         ) from e
    450     except RepositoryNotFoundError as e:
--> 451         raise EnvironmentError(
    452             f"{path_or_repo_id} is not a local folder and is not a valid model identifier "
    453             "listed on '[https://huggingface.co/models'\nIf](https://huggingface.co/models'/nIf) this is a private repository, make sure to pass a token "

OSError: unstructuredio/chipper-fast-fine-tuning is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

Expected behavior The problem comes from the repository https://huggingface.co/unstructuredio/chipper-fast-fine-tuning which is not public (or it does not exist).

Environment Info

  • unstructured 0.11.4
  • unstructured-inference 0.7.18

piegu avatar Dec 15 '23 16:12 piegu

Hi @piegu chipper model name now maps to a version that is not open to public but you can:

  • still use the public version, v1, by specifying model_name="chipperv1"
  • use the a different model likely the default "yolox"

badGarnet avatar Dec 15 '23 16:12 badGarnet

Unfortunately, even with model_name="chipperv1", it does not work in my Colab notebook (my code works with the yolox model).

But anyway, I guess this old version of chipper is worse than the chipper version via api (and even maybe than the yolox model), so using chipper locally no longer makes sense in my opinion.

Too bad to no longer continue this chipper model in Open Source.

piegu avatar Dec 15 '23 16:12 piegu

I also get the same error

hahazei avatar Apr 30 '24 07:04 hahazei

Closing chipper is not longer support in the open source lib.

MthwRobinson avatar Jun 12 '24 18:06 MthwRobinson