private-gpt
private-gpt copied to clipboard
Ingest.py failed
Describe the bug and how to reproduce it When I run ingest.py following error occurred
Loaded 1 documents from source_documents
Split into 2489 chunks of text (max. 500 characters each)
No sentence-transformers model found with name /hms/experimenting/privateGPT/models/ggml-model-q4_0.bin. Creating a new one with MEAN pooling.
Traceback (most recent call last):
File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
text = reader.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 24: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hms/experimenting/privateGPT/ingest.py", line 97, in <module>
main()
File "/hms/experimenting/privateGPT/ingest.py", line 88, in main
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)
File "/hms/env/privategpt/lib/python3.10/site-packages/langchain/embeddings/huggingface.py", line 54, in __init__
self.client = sentence_transformers.SentenceTransformer(
File "/hms/env/privategpt/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 97, in __init__
modules = self._load_auto_model(model_path)
File "/hms/env/privategpt/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 806, in _load_auto_model
transformer_model = Transformer(model_name_or_path)
File "/hms/env/privategpt/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 28, in __init__
config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 928, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/hms/env/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
raise EnvironmentError(
OSError: It looks like the config file at '/hms/experimenting/privateGPT/models/ggml-model-q4_0.bin' is not a valid JSON file.```
**Environment (please complete the following information):**
- OS / hardware: Ubuntu 20.04.6
- Python version 3.10.11
It happened to me when I tried ePub but replaced the file with PDF, and it worked fine. It took forever since it was an 18 MB file.
Same issue here seems to stroke out on epubs.