private-gpt
private-gpt copied to clipboard
gpt_tokenize: unknown token ''
I'm trying to run the PrivateGPR from a docker, so I created the below:
- Dockerfile:
# Use the python-slim version of Debian as the base image
FROM python:slim
# Update the package index and install any necessary packages
RUN apt-get update -y
RUN apt-get install -y gcc build-essential gfortran pkg-config libssl-dev g++
RUN pip3 install --upgrade pip
RUN apt-get clean
# Set the working directory to /app
WORKDIR /app
# Copy the requirements.txt and script.py files into the container
COPY requirements.txt .
RUN pip3 install --no-cache-dir -r requirements.txt --force-reinstall
COPY /script /app
COPY /models /app/models
COPY /knowledge /app/knowledge
EXPOSE 5000
# Set the default command to run when the container starts
# In this case, we're using the tail command to continuously output the contents of /dev/null
CMD ["tail", "-f", "/dev/null"]
- docker-compose.yml
version: '3.8' # Specify the version of the docker-compose file format
services: # Define the services that make up your application
app: # Name of the service
build: # Configuration for building the Docker image for this service
context: . # Path to the directory containing the Dockerfile
dockerfile: Dockerfile # Name of the Dockerfile to use
image: my-repo/my-image-name # Name of the Docker image to use or build
container_name: my-container-name # Name of the container to create
- .env
KNOWLEDGE_PATH=/app/knowledge
PERSIST_DIRECTORY=/app/db
MODEL_TYPE=GPT4All
MODEL_PATH=/app/models/ggml-gpt4all-j-v1.3-groovy.bin
EMBEDDINGS_MODEL_NAME=all-MiniLM-L12-v2
MODEL_N_CTX=1000
- Requirements.txt
transformers==4.29.2
torch==2.0.1
numexpr==2.8.4
langchain==0.0.171
pygpt4all==1.1.0
chromadb==0.3.23
llama-cpp-python==0.1.50
urllib3==2.0.2
pdfminer.six==20221105
flask==2.3.2
nicegui==1.2.14
streamlit==1.22.0
streamlit-extras==0.2.7
- Downloaded the:
Invoke-WebRequest -Uri "https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin" -OutFile "models\ggml-gpt4all-j-v1.3-groovy.bin"
-
Download the
[state_of_the_union.txt](https://github.com/imartinez/privateGPT/blob/main/source_documents/state_of_the_union.txt)
file -
Run the below:
docker-compose up -d --build
The Docker image had been created succeffuly and the image had been run as well:
At the container terminal I run the below succesfuly:
# python ingest.py
# python privateGPT.py
Enter a query: exit
Once I tried to enter another query, I got the error: gpt_tokenize: unknown token ''
See https://github.com/imartinez/privateGPT/issues/180 and https://github.com/imartinez/privateGPT/issues/214 This is a duplicate of many other issues.
``I have this too. I notice when I vary the input string - the amount of unknown tokens errors changes in my system. So, I think it something to do with LangChain processing the input string.
I went back to the sample GPT4All program and used it to just read the doc. Got no token errors, but I did have to clean the doc a bit to read it into python. the state of the union text should be cleaned on the github portal.
I ran this on i7-8865U @ 1.9 GHz - 4 core, 8 logical and it still took 5 minutes to do the sample program. Still a bit slow. but it works. probably need to see way of what chromadb shovels over to it.
`from gpt4all import GPT4All with open("./source_documents/state_of_the_union.txt") as f: text1 = f.read()
gptj = GPT4All("ggml-gpt4all-j-v1.3-groovy", "./models/") messages = [{"role": "user", "content": "summerize the following text: " + text1[:2000]}] res = gptj.chat_completion(messages, streaming=False)
print(res) `
I'm trying to run the PrivateGPR from a docker, so I created the below:
- Dockerfile:
# Use the python-slim version of Debian as the base image FROM python:slim # Update the package index and install any necessary packages RUN apt-get update -y RUN apt-get install -y gcc build-essential gfortran pkg-config libssl-dev g++ RUN pip3 install --upgrade pip RUN apt-get clean # Set the working directory to /app WORKDIR /app # Copy the requirements.txt and script.py files into the container COPY requirements.txt . RUN pip3 install --no-cache-dir -r requirements.txt --force-reinstall COPY /script /app COPY /models /app/models COPY /knowledge /app/knowledge EXPOSE 5000 # Set the default command to run when the container starts # In this case, we're using the tail command to continuously output the contents of /dev/null CMD ["tail", "-f", "/dev/null"]
- docker-compose.yml
version: '3.8' # Specify the version of the docker-compose file format services: # Define the services that make up your application app: # Name of the service build: # Configuration for building the Docker image for this service context: . # Path to the directory containing the Dockerfile dockerfile: Dockerfile # Name of the Dockerfile to use image: my-repo/my-image-name # Name of the Docker image to use or build container_name: my-container-name # Name of the container to create
- .env
KNOWLEDGE_PATH=/app/knowledge PERSIST_DIRECTORY=/app/db MODEL_TYPE=GPT4All MODEL_PATH=/app/models/ggml-gpt4all-j-v1.3-groovy.bin EMBEDDINGS_MODEL_NAME=all-MiniLM-L12-v2 MODEL_N_CTX=1000
- Requirements.txt
transformers==4.29.2 torch==2.0.1 numexpr==2.8.4 langchain==0.0.171 pygpt4all==1.1.0 chromadb==0.3.23 llama-cpp-python==0.1.50 urllib3==2.0.2 pdfminer.six==20221105 flask==2.3.2 nicegui==1.2.14 streamlit==1.22.0 streamlit-extras==0.2.7
- Downloaded the:
Invoke-WebRequest -Uri "https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin" -OutFile "models\ggml-gpt4all-j-v1.3-groovy.bin"
- Download the
[state_of_the_union.txt](https://github.com/imartinez/privateGPT/blob/main/source_documents/state_of_the_union.txt)
file- Run the below:
docker-compose up -d --build
The Docker image had been created succeffuly and the image had been run as well:
At the container terminal I run the below succesfuly:
# python ingest.py # python privateGPT.py Enter a query: exit
Once I tried to enter another query, I got the error:
gpt_tokenize: unknown token ''
https://github.com/imartinez/privateGPT/issues/328#issue-1718160410
I think there are some strange chars in the default text provied by the author. I changed the content and it disappeared.
I think there are some strange chars in the default text provied by the author. I changed the content and it disappeared.
Can you share the content you used please, so I can check it. thanks
I think there are some strange chars in the default text provied by the author. I changed the content and it disappeared.
Can you share the content you used please, so I can check it. thanks
fantastic dockerfile, can you make a repo for that?