langchain
langchain copied to clipboard
SupabaseVectorStore.from_documents is not working
System Info
langchain==0.0.300 supabase==1.1.1
Who can help?
@hwaking @eyurtsev @agola11 @eyurtsev @hwchase17 @agola11
Information
- [X] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [X] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
Creation of Supabase client
supabase_url: str = os.environ.get("SUPABASE_URL") # type: ignore supabase_key: str = os.environ.get("SUPABASE_SERVICE_KEY") # type: ignore supabase_client = create_client(supabase_url, supabase_key)
Text Splitter creation
text_splitter = CharacterTextSplitter( chunk_size=800, chunk_overlap=0, )
Embeddings
embeddings = OpenAIEmbeddings()
Loading the document
loader = PyPDFLoader("Alice_in_wonderland2.pdf") pages = loader.load_and_split() docs = text_splitter.split_documents(pages)
Save values to Supabase
vector_store = SupabaseVectorStore.from_documents(documents=docs, embedding=embeddings, client=supabase_client)
Error encountring
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\VSCode\Python\langchain project\supabase-try\test.py", line 34, in
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\langchain\vectorstores\supabase.py", line 147, in from_texts
cls._add_vectors(client, table_name, embeddings, docs, ids)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\langchain\vectorstores\supabase.py", line 323, in add_vectors
result = client.from(table_name).upsert(chunk).execute() # type: ignore
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\postgrest_sync\request_builder.py", line 57, in execute
r = self.session.request(
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py", line 814, in request
return self.send(request, auth=auth, follow_redirects=follow_redirects)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py", line 901, in send
response = self._send_handling_auth(
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py", line 929, in _send_handling_auth
response = self._send_handling_redirects(
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py", line 966, in _send_handling_redirects
response = self._send_single_request(request)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_client.py", line 1002, in _send_single_request
response = transport.handle_request(request)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_transports\default.py", line 218, in handle_request
resp = self._pool.handle_request(req)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 135, in exit
self.gen.throw(type, value, traceback)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx_transports\default.py", line 77, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.WriteTimeout: The write operation timed out
I tried changing the code according to the langchain docs as
vector_store = SupabaseVectorStore.from_documents( docs, embeddings, client=supabase_client, table_name="documents", query_name="match_documents", )
Then I encountered the following error
2023-10-05 10:33:29,879:INFO - HTTP Request: POST https://scptrclvtrvcwjdunlrn.supabase.co/rest/v1/documents "HTTP/1.1 404 Not Found"
Traceback (most recent call last):
File "D:\VSCode\Python\langchain project\supabase-try\test.py", line 34, in
I didnt create the document table in the supabase manually as i need it to be created automatically with the code. And if i need to create it manually i need to know the steps of create that as well and how to integrate it as well. Please help me immediately
Expected behavior
SupabaseVectorStore.from_documents works fine and Store all the embeddings in the vector store.
🤖
Hello,
Thank you for providing detailed information about the issue you're facing. Based on the context provided, there are a few potential solutions that might help resolve the issue.
Firstly, it seems like you're using an older version of LangChain (langchain==0.0.300). The latest version is v0.0.308. It's possible that the issue you're facing has been resolved in a newer version. I would recommend updating LangChain to the latest version and see if the issue persists.
Secondly, the error message httpx.WriteTimeout: The write operation timed out suggests that the data you're trying to write is too large, causing the operation to timeout. You could try reducing the size of the data you're writing to the database. This can be done by modifying the chunk size in the _add_vectors method of the SupabaseVectorStore class. Here's how you can do it:
# According to the SupabaseVectorStore JS implementation, the best chunk size
# is 500
chunk_size = 500 # You can modify this value to a smaller number
id_list: List[str] = []
for i in range(0, len(rows), chunk_size):
chunk = rows[i : i + chunk_size]
result = client.from_(table_name).upsert(chunk).execute() # type: ignore
if len(result.data) == 0:
raise Exception("Error inserting: No rows added")
# VectorStore.add_vectors returns ids as strings
ids = [str(i.get("id")) for i in result.data if i.get("id")]
id_list.extend(ids)
By reducing the chunk_size value, you can decrease the amount of data processed at once, which might help avoid the timeout error. However, please note that this will also increase the number of requests made to the Supabase server, which might affect performance. This solution was suggested in a similar issue #7823.
Lastly, the error message HTTP/1.1 404 Not Found suggests that the table you're trying to write to does not exist in your Supabase database. Please ensure that the table name you're trying to write to exists in your Supabase database. If the table doesn't exist, it could lead to a 404 Not Found error. In your case, the table name should be "documents".
I hope this helps! If you're still facing issues, please provide more details about your implementation and the exact error message you're receiving.
Sources
- Error creating Supabase vector store when running self-query example code
- DOC: SupabaseVectorStore.from_documents read operation timed out.
- libs/langchain/langchain/vectorstores/supabase.py
- docs/extras/modules/data_connection/retrievers/self_query/supabase_self_query.ipynb
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
Actually This Bot's answer works for me along with this document which was recommended by the bot. This code inside the document works well but when i try to store text chunk embeddings of a pdf It keeps on giving me the issue httpx.WriteTimeout: The write operation timed out
I updated my langchain version to v0.0.308 and
My code is as follows after some corrections
Superbase table creation
create table
documents (
id uuid primary key,
content text,
metadata jsonb,
embedding vector (1536)
);
create function match_documents (
query_embedding vector (1536),
filter jsonb default '{}'
) returns table (
id uuid,
content text,
metadata jsonb,
similarity float
) language plpgsql as $$
#variable_conflict use_column
begin
return query
select
id,
content,
metadata,
1 - (documents.embedding <=> query_embedding) as similarity
from documents
where metadata @> filter
order by documents.embedding <=> query_embedding;
end;
$$;
client
supabase_url: str = os.environ.get("SUPABASE_URL") # type: ignore
supabase_key: str = os.environ.get("SUPABASE_SERVICE_KEY") # type: ignore
supabase_client = create_client(supabase_url, supabase_key)
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=0
)
loader = PyPDFLoader('Alice_in_wonderland2.pdf')
data = loader.load_and_split()
docs = text_splitter.split_documents(data)
vectorstore = SupabaseVectorStore.from_documents(docs, embeddings, client=supabase_client, table_name="documents", query_name="match_documents")
But I encountered the same problem as previous because it is not allowing me to upload all the embeddings that are made from the pdf. I tried reducing the size of the chunk as well like this
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=300,
chunk_overlap=0
)
But still I got the same issue as like this.
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_exceptions.py", line 10, in map_exceptions
yield
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_backends\sync.py", line 38, in write
n = self._sock.send(buffer)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\ssl.py", line 1173, in send
return self._sslobj.write(data)
socket.timeout: The write operation timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_transports\default.py", line 60, in map_httpcore_exceptions
yield
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_transports\default.py", line 218, in handle_request
resp = self._pool.handle_request(req)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection_pool.py", line 262, in handle_request
raise exc
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection_pool.py", line 245, in handle_request
response = connection.handle_request(request)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\connection.py", line 96, in handle_request
return self._connection.handle_request(request)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 121, in handle_request
raise exc
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 90, in handle_request
self._send_request_body(**kwargs)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 144, in _send_request_body
self._send_event(event, timeout=timeout)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_sync\http11.py", line 153, in _send_event
self._network_stream.write(bytes_to_send, timeout=timeout)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_backends\sync.py", line 39, in write
buffer = buffer[n:]
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 135, in __exit__
self.gen.throw(type, value, traceback)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpcore\_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.WriteTimeout: The write operation timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\VSCode\Python\langchain project\supabase-try\test2.py", line 82, in <module>
vectorstore = SupabaseVectorStore.from_documents(
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\langchain\schema\vectorstore.py", line 422, in from_documents
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\langchain\vectorstores\supabase.py", line 147, in from_texts
cls._add_vectors(client, table_name, embeddings, docs, ids)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\langchain\vectorstores\supabase.py", line 323, in _add_vectors
result = client.from_(table_name).upsert(chunk).execute() # type: ignore
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\postgrest\_sync\request_builder.py", line 57, in execute
r = self.session.request(
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 814, in request
return self.send(request, auth=auth, follow_redirects=follow_redirects)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 901, in send
response = self._send_handling_auth(
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 929, in _send_handling_auth
response = self._send_handling_redirects(
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 966, in _send_handling_redirects
response = self._send_single_request(request)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_client.py", line 1002, in _send_single_request
response = transport.handle_request(request)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_transports\default.py", line 218, in handle_request
resp = self._pool.handle_request(req)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\contextlib.py", line 135, in __exit__
self.gen.throw(type, value, traceback)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\httpx\_transports\default.py", line 77, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.WriteTimeout: The write operation timed out
line 82 is:
vectorstore = SupabaseVectorStore.from_documents(docs, embeddings, client=supabase_client, table_name="documents", query_name="match_documents")
@hwaking @eyurtsev @agola11 @eyurtsev @hwchase17 @agola11 Please help me to solve this problem. Please give me a proper solution to upload all the pdf text as embeddings to supabase vector store.
I have the same problem
I was seeing the same issue with all sorts of different doc lengths and after coming across this post I updated everything to the below versions and now it is working python=3.11.6 supabase-1.2.0 langchain=0.0.314
I was seeing the same issue with all sorts of different doc lengths and after coming across this post I updated everything to the below versions and now it is working python=3.11.6 supabase-1.2.0 langchain=0.0.314
Thanks for replying! I've tried with these versions as well. But i didnt go through any progress yet!
I updated everything to the below versions and now it is working python=3.11.6 supabase-1.2.0 langchain=0.0.314
interestingly enough came across one just now that failed with the same error again. but not all are!
I updated everything to the below versions and now it is working python=3.11.6 supabase-1.2.0 langchain=0.0.314
interestingly enough came across one just now that failed with the same error again. but not all are!
Hello @j1philli , I am trying to reproduce this bug. Did you happen to find any patterns about the one that failed versus the ones that did not fail?
The one that is failing is massive. I run into rate limiting from openAI on the one that fails vs the ones that don't.
I am taking a massive json file, making documents out of each object(29,000-30,000) then running:
Let me know how else I can help!
Hi, I followed the instructions in https://python.langchain.com/docs/integrations/vectorstores/supabase and did not encounter the error. For creating the document table, you need to add and run the SQL queries that the docs provided to SQL Editor on https://supabase.com/ if you have already built the database and connected to it. You also need to enable the pgvector extension in the query.
I have also followed these steps and it works on others tables setup the same way.
I encountered the same write timeout exception as initially reported by the op. In my case, I narrowed the problem down to the write calls to Supabase vs. the embedding API. I cached the embeddings and directly invoked add_vectors to try to isolate. I noticed there's a comment in the code that indicates that internally add_vectors chunks by 500 items (based on what worked for JS).
What worked in my case was to send even smaller chunks of data, small being relative. Not an accurate estimate, but sending around 1.5 MB of total data worked fine, going past 2 MB fails with a write timeout exception. I hit the write timeout exception at the ~2 MB value when I chunk my docs in 200 items at a time, with 200 vectors + 200 (smallish) docs being shipped in one call to Supabase.
So my workaround for now is to ship only 100 chunks and 100 vectors in each call to add_vectors.
@danielchalef any ideas regarding what we are seeing?
So my workaround for now is to ship only 100 chunks and 100 vectors in each call to
add_vectors.
put up a pr! I don't know how to use a local python package in my code to test it out if someone can point me in the right direction
I no longer use supbase and, unfortunately, don't have time to dig in to optimize this. The VectorStore can likely be simplified and improved significantly by refactoring to use the supbase vec package:
https://supabase.com/vector
I no longer use supbase and unfortunately, don't have time to dig in to optimize this. The VectorStore can likely be simplified and improved significantly by refactoring to use the supbase
vecpackage:https://supabase.com/vector
No worries, just thought a tag was worth a shot. I'll see if I can comprehend this and integrate vec
settings chunk size smaller worked on my large dataset. working on a PR
PR is up! Made it so chunk_size is configurable in the SupabaseVectoreStore call! Hopefully gets in soon so folks can tinker to get there databases back up and running!
vector_store = SupabaseVectorStore.from_documents(documents=docs, embedding=embeddings, client=supabase_client)
Like this: vector_store = SupabaseVectorStore.from_documents(documents=docs, embedding=embeddings, client=supabase_client, chunk_size=100)
@venurawarnasooriya @Salama1429 @vasanth-asokan the PR is in the latest langchain version! @venurawarnasooriya make sure to test it out and close the issue if your are resolved. For my large dataset that was failing, I set chunk size to 100
Thank you so much @j1philli!!
PR is up! Made it so chunk_size is configurable in the SupabaseVectoreStore call! Hopefully gets in soon so folks can tinker to get there databases back up and running!
vector_store = SupabaseVectorStore.from_documents(documents=docs, embedding=embeddings, client=supabase_client)
Like this: vector_store = SupabaseVectorStore.from_documents(documents=docs, embedding=embeddings, client=supabase_client,
chunk_size=100)
In the JS SDK, this option is called upsertBatchSize if I'm not mistaken. Could we change one or the other to make the naming consistent?
In the JS SDK, this option is called
upsertBatchSizeif I'm not mistaken. Could we change one or the other to make the naming consistent?
That is a good call. Didn't think to check if this was already possible in JS. Probably close enough to release to just straight change it. Will do that today.
@pmespresso do you have a link to where it is mentioned in JS SDK?
@j1philli sure thing :)
/**
* Creates a new SupabaseVectorStore instance from an array of documents.
* @param docs The documents to create the instance from.
* @param embeddings The embeddings to use.
* @param dbConfig The configuration for the Supabase database.
* @returns A promise that resolves with a new SupabaseVectorStore instance when the instance has been created.
*/
static fromDocuments(docs: Document[], embeddings: Embeddings, dbConfig: SupabaseLibArgs): Promise<SupabaseVectorStore>;
https://github.com/langchain-ai/langchainjs/blob/893a45057f2949f5a8ed9da80b4f5cacaf63023e/langchain/src/vectorstores/supabase.ts#L37
/**
* Interface for the arguments required to initialize a Supabase library.
*/
export interface SupabaseLibArgs {
client: SupabaseClient;
tableName?: string;
queryName?: string;
filter?: SupabaseMetadata | SupabaseFilterRPCCall;
upsertBatchSize?: number;
}
the docs: https://js.langchain.com/docs/api/vectorstores_supabase/classes/SupabaseVectorStore
By the way, another thing I just noticed is that the QARetrieval type operations kept timing out on read operations even with the chunk size (setting smaller chunk size did help with writes).
In my case what I'd forgotten to do was actually create an index on my pg_vector table. That fixed it instantly (duh). Hope this helps OP or more likely myself in the future when I inevitably forget the same thing
followed up in the pr with the langchain guys to get there opinion on the rename
I encountered the same WriteTimeout issue with SupabaseVectorStore.from_documents. Reducing the chunk size didn't help. However, setting postgrest_client_timeout to None in ClientOptions while creating the Supabase client resolved it. Here's the snippet:
from supabase.lib.client_options import ClientOptions
client_options = ClientOptions(postgrest_client_timeout=None)
supabase = create_client(supabase_url, supabase_key, options=client_options)
Hope it helps
I encountered the same
WriteTimeoutissue withSupabaseVectorStore.from_documents. Reducing the chunk size didn't help. However, settingpostgrest_client_timeouttoNoneinClientOptionswhile creating the Supabase client resolved it. Here's the snippet:from supabase.lib.client_options import ClientOptions client_options = ClientOptions(postgrest_client_timeout=None) supabase = create_client(supabase_url, supabase_key, options=client_options)Hope it helps
This worked for me, thanks!!
I encountered the same
WriteTimeoutissue withSupabaseVectorStore.from_documents. Reducing the chunk size didn't help. However, settingpostgrest_client_timeouttoNoneinClientOptionswhile creating the Supabase client resolved it. Here's the snippet:from supabase.lib.client_options import ClientOptions client_options = ClientOptions(postgrest_client_timeout=None) supabase = create_client(supabase_url, supabase_key, options=client_options)Hope it helps
Solved my issue.