localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

OperationalError: too many SQL variables when the quantity of document is large.

Open Zephyruswind opened this issue 1 year ago • 3 comments

When the quantity of documents is large, the below errors accur: results = cur.execute(sql, params).fetchall() sqlite3.OperationalError: too many SQL variables

Anyone who has encounters this issue?

LOGS:

(localGPT) PS D:\projects_llm\lgpt> python ingest.py 2023-09-28 00:45:26,368 - INFO - ingest.py:123 - Loading documents from D:\projects_llm\lgpt/SOURCE_DOCUMENTS 2023-09-28 00:45:55,285 - INFO - ingest.py:132 - Loaded 9395 documents from D:\projects_llm\lgpt/SOURCE_DOCUMENTS 2023-09-28 00:45:55,285 - INFO - ingest.py:133 - Split into 674372 chunks of text 2023-09-28 00:45:56,737 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-xl load INSTRUCTOR_Transformer max_seq_length 512 Traceback (most recent call last): File "D:\projects_llm\lgpt\ingest.py", line 161, in main() File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\click\core.py", line 1157, in call return self.main(*args, **kwargs) File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\click\core.py", line 1078, in main rv = self.invoke(ctx) File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\click\core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\click\core.py", line 783, in invoke return __callback(*args, **kwargs) File "D:\projects_llm\lgpt\ingest.py", line 147, in main db = Chroma.from_documents( File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 613, in from_documents return cls.from_texts( File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 577, in from_texts chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids) File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\langchain\vectorstores\chroma.py", line 209, in add_texts self._collection.upsert( File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\chromadb\api\models\Collection.py", line 298, in upsert self._client._upsert( File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\chromadb\api\segment.py", line 290, in _upsert self._producer.submit_embeddings(coll["topic"], records_to_submit) File "C:\Users\Heaven.conda\envs\localGPT\lib\site-packages\chromadb\db\mixins\embeddings_queue.py", line 145, in submit_embeddings results = cur.execute(sql, params).fetchall() sqlite3.OperationalError: too many SQL variables

Zephyruswind avatar Sep 28 '23 00:09 Zephyruswind

I got that too but I use chatgpt to fix that and it work flawlessly after that even with large ingestion. Here is the chatgpt chat and you may to increase the batch size from 1000 to 20000 for example depending on your GPU. If GPU is powerful then more batch size will be faster to ingest: https://chat.openai.com/share/93ed3d48-2e8e-41da-8397-17bcc9b4672c

hakemz91 avatar Sep 29 '23 15:09 hakemz91

cannot open link. can you please paste solution here. How to decide batch size based on GPU memory? is there any calculation?

bp020108 avatar Feb 07 '24 16:02 bp020108

I did not succeed with the proposed solution above. https://github.com/PromtEngineer/localGPT/issues/679 gave a precise solution that worked instead.

spicedreams avatar Jun 15 '24 22:06 spicedreams