pg_net
pg_net copied to clipboard
Webhooks fail when multiple inserts are done
Bug report
Describe the bug
I have a webhook configured to watch a table for inserts. On an insert a POST request should be fired to an API hosted on Vercel. This works when one insert operation is carried out. If multiple inserts are carried out: .insert([{message: 1}, {message: 2}, {message: 3}]) then the webhooks aren't fired at all. If instead we carry out multiple inserts by calling the Supabase API multiple times then the webhook is fired for only some of the inserts.
To Reproduce
Steps to reproduce the behavior, please provide code snippets or a repository:
-
Configure a webhook on a table to watch for insert operations and call an external API
-
Use the Supabase JS library to perform multiple inserts like:
let { data, error } = await supabase .from("example_table") .insert([{message: 1}, {message: 2}, {message: 3}, {message: 4}, {message: 5}, {message: 6}, etc ]) .select("message");
-
Check logs for external API to confirm it didn't receive any requests
-
See error
Something to note is that I was inserting 32 things at once.
Expected behavior
For each insert, I would expect the external API to be called. It would also help if there was a log in Supabase that showed what happens when the webhook is run.
Screenshots
If applicable, add screenshots to help explain your problem.
System information
- OS: N/A
- Browser (if applies): N/A
- Version of supabase-js: 2.0.0-rc.8
- Version of Node.js: v16.17.0 on desktop, 18.x on Vercel (per their docs)
Additional context
Add any other context about the problem here.
Thank you for posting this. I'm receiving a similar error.
Surprisingly when I create 10 records at the same time it fires the function for each 10 records...
INSERT INTO public.profiles (id, first_name, last_name, stripe_customer_id, email)
SELECT u.id, NULL, NULL, NULL, u.email
FROM auth.users u
LEFT JOIN public.profiles p ON u.id = p.id
WHERE p.id IS NULL
LIMIT 10;
However, if I fire the function for 100 records...
INSERT INTO public.profiles (id, first_name, last_name, stripe_customer_id, email)
SELECT u.id, NULL, NULL, NULL, u.email
FROM auth.users u
LEFT JOIN public.profiles p ON u.id = p.id
WHERE p.id IS NULL
LIMIT 100;
It fails
Yup, basically the webhook behaviour is flakey. I haven't tried it recently. I wanted to use this to build an event driven architecture but can't without it being reliable. Ended up using QStash as a replacement for the webhook. Would love Supabase to have some kind of full featured event queue like SQS/Eventbridge/Service Bus.
We've tested pg_net sending 1K inserts per second, this to a local nginx server though.
@devdras @danrasmuson Which webhook service are you using? Likely it's rate limiting the requests.
We've tested pg_net sending 1K inserts per second, this to a local nginx server though.
@devdras @danrasmuson Which webhook service are you using? Likely it's rate limiting the requests.
Im using vercel with serverless functions
We're experiencing the same problem. When inserting records in sequence all the webhooks are received. But if we insert 30 rows in parallel none of the webhooks send and the net/_http_response
table is full of Timeout was reached
errors.
@matt-aitken Are you using vercel serverless functions too?
@steve-chavez We're able to reliably reproduce it based on the number of new entries that get bulk added to the table—have been in touch with your support team about this as well, but unable to resolve it.
- The webhook trigger is calling an AWS Lambda function
- The lambda function does not record any invocations if the number of records inserted is larger than >150
- This is an all or nothing behaviour. If we insert fewer records, all invocations happen, if we insert more records none of the invocations happen
- It's not the case that only some of the invocations don't come through
- It's not the case that the invocation reaches the Lambda function and the webhook times out before the function responds
- In supabase, the _http_response table only logs "Timeout was reached", nothing else
- We have tested this with different webhooks — including webhook.site. No invocations of the webhook are received
Happy to prepare a screen-recording or do a walk through if helpful.
The lambda function does not record any invocations if the number of records inserted is larger than >150
@lohrmann That might be related to lambdas rate limiting https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html#api-requests
We're able to reliably reproduce it based on the number of new entries that get bulk added to the table
By default there are 200 requests per iteration https://github.com/supabase/pg_net/blob/master/src/worker.c#L479-L483
I'll make this parameter user configurable.
Curl has:
curl --rate 2/s -O https://example.com/[1-100].jpg
https://everything.curl.dev/usingcurl/transfers/request-rate#examples
I think that should be the final solution here.
Note: the above is not available on libcurl
.
Sorry for the slow response on this. No, we're receiving webhooks to AWS EC2 (a long-running server).
There's nothing in our server logs where we receive requests from Supabase (when inserting many rows) and the net/_http_response
table is full of Timeout was reached
errors.
We (Trigger.dev) make it easy for our customers to subscribe to these Supabase database webhooks and quite a few people are hitting this problem now.
We're experiencing the same problem. When inserting records in sequence all the webhooks are received. But if we insert 30 rows in parallel none of the webhooks send and the
net/_http_response
table is full ofTimeout was reached
errors.
I have the same issue. I'm using Google Cloud Functions v2 (Cloud Run). Table is full of Timeout was reached
, but in Google Cloud Monitoring don't have any logs.
I'm running into this issue as well. I spun up a clean environment to reproduce, and did some testing. Supabase project qutskxfebtlgwyzsonvd
. The setup is:
- A
users
table - A
queue
table - A cron job that inserts one item into the
queue
for each user - A database webhook that sends the
queue
insert to an HTTPS endpoint, an AWS API Gateway - AWS API Gateway configured for asynchronous invocation, which always returns a
200
response (typically in <80ms) - AWS Lambda function that is just
sleep(5)
I ran a handful of tests (spreadsheet here, screenshot below). It looks like a pretty linear relationship between the number of requests (in my case, the number of users
for which the cron job inserts items into the queue
) and the webhook timeout configuration. By my math, the webhook needs about 33.33ms per item to process. If only 20ms per item is available within the webhook timeout configuration, most or all of the events fail to be sent to the API Gateway.
Unlike the note from @lohrmann above, I did see some partial successes.
@reedshea Thanks for the detailed information.
@TheOtherBrian1 Shared a reproducible example: https://gist.github.com/TheOtherBrian1/e479426f0070d5f0879483cf59f85dfb
The setup is basically:
CREATE TABLE webhook_test (
id SERIAL,
sample text
);
CREATE OR REPLACE FUNCTION debugging_example()
RETURNS TRIGGER
AS $$
DECLARE
-- Store payload data
row_data_var JSONB;
row_id INT;
BEGIN
-- Retrieve row data and convert to JSON
SELECT TO_JSONB(NEW.*) INTO row_data_var;
-- Initiate HTTP POST request to URL
SELECT
net.http_post(
url := 'https://postman-echo.com/post',
-- Use row data as payload
body := row_data_var
) INTO row_id;
RAISE LOG
'LOGGING WEBHOOK |
The Webhook was called after %s |
ID = %s |
Logging an entire row as JSON (%)', (SELECT NOW()), row_id, row_data_var;
-- Return the row data to inspect
RETURN NEW;
-- Handle exceptions here if needed
EXCEPTION
WHEN others THEN
RAISE EXCEPTION 'LOGGING WEBHOOK FAILED: An error occurred: %', SQLERRM;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER webhook
AFTER INSERT ON webhook_test
FOR EACH ROW
EXECUTE FUNCTION debugging_example();
CREATE OR REPLACE FUNCTION insert_test_data(num_rows INT)
RETURNS VOID
AS $$
DECLARE
i INT;
BEGIN
FOR i IN 1..num_rows LOOP
INSERT INTO webhook_test(sample) VALUES (i || 'Call');
END LOOP;
END;
$$ LANGUAGE plpgsql;
Then execute a request:
INSERT INTO webhook_test (sample)
VALUES ('test value');
Then a batch of requests:
SELECT insert_test_data(100);
Locally, all requests succeed fast:
select count(*) from net._http_response where error_msg is null;
-[ RECORD 1 ]
count | 101
select count(*) from net._http_response where error_msg is not null;
-[ RECORD 1 ]
count | 0
But, for some reason, on Supabase cloud only the single request succeeds and the batched ones fail:
select count(*) from net._http_response where error_msg is null;
count |
---|
1 |
select count(*) from net._http_response where error_msg is not null;
count |
---|
100 |
Will debug this further.
I should add that Supabase is on 0.7.1, the latest version is 0.7.3. I've tried 0.7.1 locally and the result is the same.
By increasing the default timeout (2 seconds
) to 4 seconds
, all the above requests succeed on Supabase cloud:
net.http_post(
url := 'https://postman-echo.com/post',
-- Use row data as payload
body := row_data_var,
timeout_milliseconds := 4000
) INTO row_id;
SELECT insert_test_data(100);
select count(*) from net._http_response where error_msg is null;
count |
---|
100 |
Will increase the default to 5 seconds.
Will increase the default to 5 seconds.
Is that setting the timeout for the whole batch? It seems based on @reedshea's graph, the whole batch needs enough time per call. Or is he setting some other timeout?
I just ran into this with my staging environment. I updated a table which caused 121 events to need to be published, all in one transaction (single UPDATE statement). All 121 http_post
requests fail with timeout error after ~2 seconds. I bumped the timeout_milliseconds
to 4000 but that wasn't sufficient. I had to bump it to 6000 to allow all of them to succeed. I'm submitting task requests to Inngest, which responds really fast.
I think the right thing is to multiply the timeout by the number of requests per batch if it is not possible to make it actually be a per-request timeout. Passing the timeout on a per-request basis gives the developer the expectation that it applies to that one request, not the entire batch in which that request happens to be included by the net
extension batching algorithm.
What happens if different requests get passed different timeouts? Which one is used for that batch?
@vickkhera I wasn't actually aware of the cause of the bug - I was only aware of how to reproduce it - but you seem to have noticed that the timeout is caused when a batch (default 500) does not complete in the set timeout period. A later batch would actually succeed if its queue were smaller or the timeout were higher.
I assumed that the timeout setting was for an individual cURL requests and not the entire batch. @steve-chavez, is this the intended behavior?
The timeouts are per-request. Will add a test to confirm soon.
The timeouts are per-request. Will add a test to confirm soon.
There's no way my requests were taking 2s each. The whole batch timed out in ~2-3 seconds.
We're running into this same issue. "net._http_response" table shows the "Timeout was reached" error yet the "timed_out" column remains null. This is happening with batches of 8 or less, and it seemed like after the initial failure subsequent inserts into the table with the trigger failed even with a batch of 1.
I'm having this problem as well. Can reproduce it based on the number of new entries that get bulk added to the table. Around 10 work, more than that get a significant % of errors
- The webhook trigger is calling an AWS Lambda function (via Gateway).
- The lambda function does not record any invocations if the number of records inserted
- It's not the case that only some of the invocations don't come through
- It's not the case that the invocation reaches the Lambda function and the webhook times out before the function responds
- Other times it doesn't time out and just returns a 500 with "Internal Server Error"
- In supabase, the _http_response table logs everything
All using pgnet and supabase webhooks. Would love if theres some solution? Using the lambda to then load SQS. Thinking of skipping this step now...
I'm having this problem as well. Can reproduce it based on the number of new entries that get bulk added to the table. Around 10 work, more than that get a significant % of errors
- The webhook trigger is calling an AWS Lambda function (via Gateway).
- The lambda function does not record any invocations if the number of records inserted
- It's not the case that only some of the invocations don't come through
- It's not the case that the invocation reaches the Lambda function and the webhook times out before the function responds
- Other times it doesn't time out and just returns a 500 with "Internal Server Error"
- In supabase, the _http_response table logs everything
All using pgnet and supabase webhooks. Would love if theres some solution? Using the lambda to then load SQS. Thinking of skipping this step now...
I am doing something similar, except API Gateway populates SQS which then calls a lambda function. Worked fine on small data sets but when I had to make over 100 calls I saw similar behavior as to what you (and others) describe. I set the timeout to 5 seconds and it is working now for me, although that feels very fragile. Who knows if it will work if I have a big burst of calls I need to make. The UI says 5000 ms is the max timeout, but I believe you can set it higher than that if you just create the function via SQL editor, though I'm not sure if that has any side effects.
I'm having this problem as well. Can reproduce it based on the number of new entries that get bulk added to the table. Around 10 work, more than that get a significant % of errors
- The webhook trigger is calling an AWS Lambda function (via Gateway).
- The lambda function does not record any invocations if the number of records inserted
- It's not the case that only some of the invocations don't come through
- It's not the case that the invocation reaches the Lambda function and the webhook times out before the function responds
- Other times it doesn't time out and just returns a 500 with "Internal Server Error"
- In supabase, the _http_response table logs everything
All using pgnet and supabase webhooks. Would love if theres some solution? Using the lambda to then load SQS. Thinking of skipping this step now...
I am doing something similar, except API Gateway populates SQS which then calls a lambda function. Worked fine on small data sets but when I had to make over 100 calls I saw similar behavior as to what you (and others) describe. I set the timeout to 5 seconds and it is working now for me, although that feels very fragile. Who knows if it will work if I have a big burst of calls I need to make. The UI says 5000 ms is the max timeout, but I believe you can set it higher than that if you just create the function via SQL editor, though I'm not sure if that has any side effects.
yea i have it set to 5000ms and still having the problem. Agree it seemed fragile regardless. would love to know if doing via SQL Editor solves?
Otherwise gonna basically skip the webhook altogether as can't risk it in production
@waylon999 You can set the timeout to greater than 5000ms. That's a limitation imposed by the UI. You can write the webhooks directly in pg_net or use the supabase_functions.http_request helper function.
This bug has been open for a year, and it's really frustrating that issues like this are not properly addressed by the Supabase team. They are adding other advanced features like branching, but the core remains unstable, and I don't see a significant effort to fix that. I migrated a relatively small project from Firestore/Firebase to Supabase because I believed in their marketing, and now I really regret this decision. Currently, I'm paying more for significantly worse quality:
-
Webhooks are unstable. In my previous architecture, I heavily relied on Firestore triggers to perform some operations in the background. Now, they can randomly fail without any option to retry, which is a HUGE problem when something needs to happen in the background of our app.
-
Realtime functionality is useless; it only works fine when you have an active tab in the browser. Then, you need to figure out the whole mechanism for refetching data and reconnecting after you go from inactive. I don't even want to mention how painful this is on mobile (React Native). With Firebase, I didn't even have to think about it; it just worked out of the box.
-
Supervisor / pg_bouncer - Random failures when connecting to the DB from cloud functions / AWS Lambda (Connection terminated unexpectedly).
-
Upgrading your db? Okay, our app will be unavailable for minutes or hours, and then you need to figure out why something doesn't work.
-
Storage - You can upload from RN, but first, you need to convert the entire file into base64 and then use ArrayBuffer.
But yes, Supabase is open-sourced. I guess I'll tell my clients - our app doesn't work properly, but at least you can read the source code.
@waylon999 You can set the timeout to greater than 5000ms. That's a limitation imposed by the UI. You can write the webhooks directly in pg_net or use the supabase_functions.http_request helper function.
does this solve the problem though? Seems fragile regardless of timeout (esp if unclear if timeout will be 5 seconds or 30)
Any new updates on this?
Any new updates on this?
pg_net has been super inconsistent. also tends to hang up sporadically. tried to build an event-driven app but decide to give up & rely on cron job instead
Is there any ETA on this? It's blocking me and I would need to do some workaround. Thanks!