private-gpt
private-gpt copied to clipboard
Ingest Initialization Error
I encountered an error while running the ingest.py script. The error message indicates an issue with the sentence-transformers model configuration file. Here are the steps to reproduce the bug:
- Clone the PrivateGPT repository from GitHub.
- Set up the environment with the required dependencies as mentioned in the repository's documentation.
- Execute the ingest.py script with the following command:
-
python ingest.py
Expected behavior
I expected the ingest.py script to run successfully and process the documents without any errors. Environment
OS / hardware: Ubuntu 20.04 LTS / Intel Core i7 / 16GB RAM / 512GB SSD Python version: 3.9.6 Other relevant information: I have followed the installation instructions provided in the PrivateGPT repository and have the required packages installed. Additional context
The ggml-model-q4_0.bin file is located in the models/ directory. I have verified that the file is a valid JSON configuration file. However, the error message suggests that it is not recognized as a valid JSON file during the execution of the script
Error: $ python ingest.py No sentence-transformers model found with name models/ggml-model-q4_0.bin. Creating a new one with MEAN pooling. Traceback (most recent call last): File "/home/quartet/PrivateGPT/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict config_dict = cls._dict_from_json_file(resolved_config_file) File "/home/quartet/PrivateGPT/privategpt/lib/python3.10/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file text = reader.read() File "/usr/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 24: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/quartet/PrivateGPT/privategpt/privateGPT/ingest.py", line 170, in
FYI, this is a known broader issue with Python and Windows 10
My solution was to add the following to both ingest.py
and privateGPT.py
- just before the load_dotenv()
call.
HTH
import sys
import io
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8')