localGPT
localGPT copied to clipboard
Add a Dockerfile
I couldn't make it work on my Linux system b/c of some Python dependency problems. Made this thing run in a Docker.
I wonder if someone could test this ^^ for me. What I see:
docker build -t lgpt .
docker run -ti lgpt
I get to prompt and try to run the ingest.py and that's what I'm getting
# python3 ingest.py --device_type cpu
Loading documents from /root/SOURCE_DOCUMENTS
Loaded 0 documents from /root/SOURCE_DOCUMENTS
Split into 0 chunks of text
load INSTRUCTOR_Transformer
max_seq_length 512
Using embedded DuckDB with persistence: data will be stored in: /root/DB
Traceback (most recent call last):
File "/root/ingest.py", line 78, in <module>
main()
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/root/ingest.py", line 72, in main
db = Chroma.from_documents(texts, embeddings, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS)
File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 413, in from_documents
return cls.from_texts(
File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 381, in from_texts
chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 158, in add_texts
embeddings = self._embedding_function.embed_documents(list(texts))
File "/usr/local/lib/python3.10/dist-packages/langchain/embeddings/huggingface.py", line 148, in embed_documents
embeddings = self.client.encode(instruction_pairs)
File "/usr/local/lib/python3.10/dist-packages/InstructorEmbedding/instructor.py", line 524, in encode
if isinstance(sentences[0],list):
IndexError: list index out of range
The SOURCE_DOCUMENTS/ has 1 empty file empty.txt.
Is there a docker solution?
@oktaborg What do you mean?
@oktaborg What do you mean?
Is there a docker file that I can try?
@oktaborg Yes. This branch provides this file. Just fetch my branch or go to "Files changed" tab and copy & paste the file from there.
I wonder if someone could test this ^^ for me. What I see:
docker build -t lgpt . docker run -ti lgptI get to prompt and try to run the
ingest.pyand that's what I'm getting# python3 ingest.py --device_type cpu Loading documents from /root/SOURCE_DOCUMENTS Loaded 0 documents from /root/SOURCE_DOCUMENTS Split into 0 chunks of text load INSTRUCTOR_Transformer max_seq_length 512 Using embedded DuckDB with persistence: data will be stored in: /root/DB Traceback (most recent call last): File "/root/ingest.py", line 78, in <module> main() File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/root/ingest.py", line 72, in main db = Chroma.from_documents(texts, embeddings, persist_directory=PERSIST_DIRECTORY, client_settings=CHROMA_SETTINGS) File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 413, in from_documents return cls.from_texts( File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 381, in from_texts chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) File "/usr/local/lib/python3.10/dist-packages/langchain/vectorstores/chroma.py", line 158, in add_texts embeddings = self._embedding_function.embed_documents(list(texts)) File "/usr/local/lib/python3.10/dist-packages/langchain/embeddings/huggingface.py", line 148, in embed_documents embeddings = self.client.encode(instruction_pairs) File "/usr/local/lib/python3.10/dist-packages/InstructorEmbedding/instructor.py", line 524, in encode if isinstance(sentences[0],list): IndexError: list index out of rangeThe
SOURCE_DOCUMENTS/has 1 empty fileempty.txt.
The error is caused because the file is empty. Probably will need to test it with some other file.
I couldn't make it work on my Linux system b/c of some Python dependency problems. Made this thing run in a Docker.
Can you please update the Readme with instructions.