localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

Add a Dockerfile

Open wkoszek opened this issue 2 years ago • 7 comments

I couldn't make it work on my Linux system b/c of some Python dependency problems. Made this thing run in a Docker.

wkoszek avatar Jun 05 '23 05:06 wkoszek

I wonder if someone could test this ^^ for me. What I see:

docker build -t lgpt .
docker run -ti lgpt

I get to prompt and try to run the ingest.py and that's what I'm getting

# python3 ingest.py --device_type cpu
Loading documents from /root/SOURCE_DOCUMENTS
Loaded 0 documents from /root/SOURCE_DOCUMENTS
Split into 0 chunks of text
load INSTRUCTOR_Transformer
max_seq_length  512
Using embedded DuckDB with persistence: data will be stored in: /root/DB
Traceback (most recent call last):
  File "/root/ingest.py", line 78, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/root/ingest.py", line 72, in main
    db = Chroma.from_documents(texts, embeddings, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS)
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 413, in from_documents
    return cls.from_texts(
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 381, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 158, in add_texts
    embeddings = self._embedding_function.embed_documents(list(texts))
  File "/usr/local/lib/python3.10/dist-packages/langchain/embeddings/huggingface.py", line 148, in embed_documents
    embeddings = self.client.encode(instruction_pairs)
  File "/usr/local/lib/python3.10/dist-packages/InstructorEmbedding/instructor.py", line 524, in encode
    if isinstance(sentences[0],list):
IndexError: list index out of range

The SOURCE_DOCUMENTS/ has 1 empty file empty.txt.

wkoszek avatar Jun 05 '23 05:06 wkoszek

Is there a docker solution?

oktaborg avatar Jun 05 '23 07:06 oktaborg

@oktaborg What do you mean?

wkoszek avatar Jun 07 '23 05:06 wkoszek

@oktaborg What do you mean?

Is there a docker file that I can try?

oktaborg avatar Jun 07 '23 06:06 oktaborg

@oktaborg Yes. This branch provides this file. Just fetch my branch or go to "Files changed" tab and copy & paste the file from there.

wkoszek avatar Jun 07 '23 06:06 wkoszek

I wonder if someone could test this ^^ for me. What I see:

docker build -t lgpt .
docker run -ti lgpt

I get to prompt and try to run the ingest.py and that's what I'm getting

# python3 ingest.py --device_type cpu
Loading documents from /root/SOURCE_DOCUMENTS
Loaded 0 documents from /root/SOURCE_DOCUMENTS
Split into 0 chunks of text
load INSTRUCTOR_Transformer
max_seq_length  512
Using embedded DuckDB with persistence: data will be stored in: /root/DB
Traceback (most recent call last):
  File "/root/ingest.py", line 78, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/root/ingest.py", line 72, in main
    db = Chroma.from_documents(texts, embeddings, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS)
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 413, in from_documents
    return cls.from_texts(
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 381, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 158, in add_texts
    embeddings = self._embedding_function.embed_documents(list(texts))
  File "/usr/local/lib/python3.10/dist-packages/langchain/embeddings/huggingface.py", line 148, in embed_documents
    embeddings = self.client.encode(instruction_pairs)
  File "/usr/local/lib/python3.10/dist-packages/InstructorEmbedding/instructor.py", line 524, in encode
    if isinstance(sentences[0],list):
IndexError: list index out of range

The SOURCE_DOCUMENTS/ has 1 empty file empty.txt.

The error is caused because the file is empty. Probably will need to test it with some other file.

PromtEngineer avatar Jun 10 '23 04:06 PromtEngineer

I couldn't make it work on my Linux system b/c of some Python dependency problems. Made this thing run in a Docker.

Can you please update the Readme with instructions.

PromtEngineer avatar Jun 10 '23 04:06 PromtEngineer