pg_net icon indicating copy to clipboard operation
pg_net copied to clipboard

Currently Does not Scale Well

Open tmountain opened this issue 9 months ago • 2 comments

Bug report

Once pg_net reaches a certain number of jobs, jobs start to time out aggressively. This makes it unsuitable for any type of production workload where you may be dealing with a burstable influx of data. As an example, we wrote a function to inject jobs from Supabase to SQS and thought we could rely on pg_net. It works fine if you inject ~10 jobs at a time, but once you start going into larger numbers, EVERYTHING times out. This clearly is not SQS, as we moved to pgmq instead of pg_net, and the problems disappeared immediately.

I like and support the idea of this project, but it feels flimsy as a core module of Supabase. I cannot recommend it to anyone at this time.

Just to show that I am not alone, lots of people are having similar experiences.

https://www.reddit.com/r/Supabase/comments/1hc4398/has_anyone_ever_needed_to_expand_pg_net_beyond/

Describe the bug

Jobs queued with pg_net cascade into timeouts as the backlog is increased.

To Reproduce

Steps to reproduce the behavior, please provide code snippets or a repository:

  1. Setup a trigger that fires off a pg_net request.
  2. Fire the trigger for ~ 200 rows or more.
  3. Everything starts to timeout.

Expected behavior

A job backlog is okay. There should not be a timeout issue.

Screenshots

postgres=> select id, status_code, timed_out, error_msg from net._http_response limit 50;
   id   | status_code | timed_out |      error_msg
--------+-------------+-----------+---------------------
      1 |         200 | f         |
 186188 |         200 | f         |
    147 |             |           | Timeout was reached
    148 |             |           | Timeout was reached
    149 |             |           | Timeout was reached
    150 |             |           | Timeout was reached
    151 |             |           | Timeout was reached
    152 |             |           | Timeout was reached
    153 |             |           | Timeout was reached
    154 |             |           | Timeout was reached
    155 |             |           | Timeout was reached
    156 |             |           | Timeout was reached
    157 |             |           | Timeout was reached
    158 |             |           | Timeout was reached
    159 |             |           | Timeout was reached
    160 |             |           | Timeout was reached
    161 |             |           | Timeout was reached
    162 |             |           | Timeout was reached
    163 |             |           | Timeout was reached
    164 |             |           | Timeout was reached
    165 |             |           | Timeout was reached
    166 |             |           | Timeout was reached
    167 |             |           | Timeout was reached
    168 |             |           | Timeout was reached
    169 |             |           | Timeout was reached
    170 |             |           | Timeout was reached

tmountain avatar Mar 18 '25 15:03 tmountain

Make sure you're at least on v0.11.0, it already fixed sporadic timeouts (see https://github.com/supabase/pg_net/issues/86#issuecomment-2471290734).

147 |             |           | Timeout was reached

It would be more helpful to know the cause of the timeout. The v0.14.0 version has a more detailed error. Timeouts can be legitimate network issues.

Just to show that I am not alone, lots of people are having similar experiences https://www.reddit.com/r/Supabase/comments/1hc4398/has_anyone_ever_needed_to_expand_pg_net_beyond/ Has anyone ever needed to expand pg_net beyond 200 requests per second?

More throughput is being tracked on https://github.com/supabase/pg_net/issues/160

A job backlog is okay. There should not be a timeout issue.

Retries are tracked on https://github.com/supabase/pg_net/issues/110. But make sure you're on the latest pg_net version.

I like and support the idea of this project, but it feels flimsy as a core module of Supabase. I cannot recommend it to anyone at this time.

Note that the project hasn't reached a v1.0 release yet. We're definitely working on improvements.

steve-chavez avatar Mar 18 '25 17:03 steve-chavez

We upgraded everything recently. Let me confirm that the issue is still occurring and see if I can present an easily reproducible test case.

tmountain avatar Mar 24 '25 11:03 tmountain

I think unexpected timeouts has been solved for a while.

@tmountain Have you run into this again?

@diraneyya Also since you're using pg_net extensively, perhaps you can confirm if you've run into this?

steve-chavez avatar Aug 22 '25 20:08 steve-chavez

I am in the final stages of starting my scraping project using pg_fetch_cycle (built on top of pg_net) so I will be able to confirm whether this is occurring or not when issuing requests at scale in 1-2 weeks.

diraneyya avatar Aug 22 '25 21:08 diraneyya

Just FYI, our CI now loadtests varying number_of_requests+batch_sizes (only GETs for now) and measures CPU/MEM usage, we do this on every PR to ensure we don't introduce regressions: https://github.com/supabase/pg_net/actions/runs/17220414666?pr=231

So far there are no unexpected timeouts.

steve-chavez avatar Aug 25 '25 19:08 steve-chavez