private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Ingest.py failed

Open ThamaluM opened this issue 1 year ago • 1 comments

Describe the bug and how to reproduce it When I run ingest.py following error occurred

Loaded 1 documents from source_documents
Split into 2489 chunks of text (max. 500 characters each)
No sentence-transformers model found with name /hms/experimenting/privateGPT/models/ggml-model-q4_0.bin. Creating a new one with MEAN pooling.
Traceback (most recent call last):
  File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 24: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/hms/experimenting/privateGPT/ingest.py", line 97, in <module>
    main()
  File "/hms/experimenting/privateGPT/ingest.py", line 88, in main
    embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
  File "/hms/env/privategpt/lib/python3.10/site-packages/langchain/embeddings/huggingface.py", line 54, in __init__
    self.client = sentence_transformers.SentenceTransformer(
  File "/hms/env/privategpt/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 97, in __init__
    modules = self._load_auto_model(model_path)
  File "/hms/env/privategpt/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 806, in _load_auto_model
    transformer_model = Transformer(model_name_or_path)
  File "/hms/env/privategpt/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 28, in __init__
    config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
  File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 928, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '/hms/experimenting/privateGPT/models/ggml-model-q4_0.bin' is not a valid JSON file.```

**Environment (please complete the following information):**
 - OS / hardware: Ubuntu 20.04.6
 - Python version 3.10.11

ThamaluM avatar May 23 '23 06:05 ThamaluM

It happened to me when I tried ePub but replaced the file with PDF, and it worked fine. It took forever since it was an 18 MB file.

zaramal avatar May 23 '23 08:05 zaramal

Same issue here seems to stroke out on epubs.

Apocawaka avatar Aug 29 '23 22:08 Apocawaka