Currently Does not Scale Well
Bug report
Once pg_net reaches a certain number of jobs, jobs start to time out aggressively. This makes it unsuitable for any type of production workload where you may be dealing with a burstable influx of data. As an example, we wrote a function to inject jobs from Supabase to SQS and thought we could rely on pg_net. It works fine if you inject ~10 jobs at a time, but once you start going into larger numbers, EVERYTHING times out. This clearly is not SQS, as we moved to pgmq instead of pg_net, and the problems disappeared immediately.
I like and support the idea of this project, but it feels flimsy as a core module of Supabase. I cannot recommend it to anyone at this time.
Just to show that I am not alone, lots of people are having similar experiences.
https://www.reddit.com/r/Supabase/comments/1hc4398/has_anyone_ever_needed_to_expand_pg_net_beyond/
Describe the bug
Jobs queued with pg_net cascade into timeouts as the backlog is increased.
To Reproduce
Steps to reproduce the behavior, please provide code snippets or a repository:
- Setup a trigger that fires off a pg_net request.
- Fire the trigger for ~ 200 rows or more.
- Everything starts to timeout.
Expected behavior
A job backlog is okay. There should not be a timeout issue.
Screenshots
postgres=> select id, status_code, timed_out, error_msg from net._http_response limit 50;
id | status_code | timed_out | error_msg
--------+-------------+-----------+---------------------
1 | 200 | f |
186188 | 200 | f |
147 | | | Timeout was reached
148 | | | Timeout was reached
149 | | | Timeout was reached
150 | | | Timeout was reached
151 | | | Timeout was reached
152 | | | Timeout was reached
153 | | | Timeout was reached
154 | | | Timeout was reached
155 | | | Timeout was reached
156 | | | Timeout was reached
157 | | | Timeout was reached
158 | | | Timeout was reached
159 | | | Timeout was reached
160 | | | Timeout was reached
161 | | | Timeout was reached
162 | | | Timeout was reached
163 | | | Timeout was reached
164 | | | Timeout was reached
165 | | | Timeout was reached
166 | | | Timeout was reached
167 | | | Timeout was reached
168 | | | Timeout was reached
169 | | | Timeout was reached
170 | | | Timeout was reached
Make sure you're at least on v0.11.0, it already fixed sporadic timeouts (see https://github.com/supabase/pg_net/issues/86#issuecomment-2471290734).
147 | | | Timeout was reached
It would be more helpful to know the cause of the timeout. The v0.14.0 version has a more detailed error. Timeouts can be legitimate network issues.
Just to show that I am not alone, lots of people are having similar experiences https://www.reddit.com/r/Supabase/comments/1hc4398/has_anyone_ever_needed_to_expand_pg_net_beyond/ Has anyone ever needed to expand pg_net beyond 200 requests per second?
More throughput is being tracked on https://github.com/supabase/pg_net/issues/160
A job backlog is okay. There should not be a timeout issue.
Retries are tracked on https://github.com/supabase/pg_net/issues/110. But make sure you're on the latest pg_net version.
I like and support the idea of this project, but it feels flimsy as a core module of Supabase. I cannot recommend it to anyone at this time.
Note that the project hasn't reached a v1.0 release yet. We're definitely working on improvements.
We upgraded everything recently. Let me confirm that the issue is still occurring and see if I can present an easily reproducible test case.
I think unexpected timeouts has been solved for a while.
@tmountain Have you run into this again?
@diraneyya Also since you're using pg_net extensively, perhaps you can confirm if you've run into this?
I am in the final stages of starting my scraping project using pg_fetch_cycle (built on top of pg_net) so I will be able to confirm whether this is occurring or not when issuing requests at scale in 1-2 weeks.
Just FYI, our CI now loadtests varying number_of_requests+batch_sizes (only GETs for now) and measures CPU/MEM usage, we do this on every PR to ensure we don't introduce regressions: https://github.com/supabase/pg_net/actions/runs/17220414666?pr=231
So far there are no unexpected timeouts.