localGPT
localGPT copied to clipboard
OperationalError: too many SQL variables when the quantity of document is large.
When the quantity of documents is large, the below errors accur: results = cur.execute(sql, params).fetchall() sqlite3.OperationalError: too many SQL variables
Anyone who has encounters this issue?
LOGS:
(localGPT) PS D:\projects_llm\lgpt> python ingest.py
2023-09-28 00:45:26,368 - INFO - ingest.py:123 - Loading documents from D:\projects_llm\lgpt/SOURCE_DOCUMENTS
2023-09-28 00:45:55,285 - INFO - ingest.py:132 - Loaded 9395 documents from D:\projects_llm\lgpt/SOURCE_DOCUMENTS
2023-09-28 00:45:55,285 - INFO - ingest.py:133 - Split into 674372 chunks of text
2023-09-28 00:45:56,737 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-xl
load INSTRUCTOR_Transformer
max_seq_length 512
Traceback (most recent call last):
File "D:\projects_llm\lgpt\ingest.py", line 161, in
I got that too but I use chatgpt to fix that and it work flawlessly after that even with large ingestion. Here is the chatgpt chat and you may to increase the batch size from 1000 to 20000 for example depending on your GPU. If GPU is powerful then more batch size will be faster to ingest: https://chat.openai.com/share/93ed3d48-2e8e-41da-8397-17bcc9b4672c
cannot open link. can you please paste solution here. How to decide batch size based on GPU memory? is there any calculation?
I did not succeed with the proposed solution above. https://github.com/PromtEngineer/localGPT/issues/679 gave a precise solution that worked instead.