nodejs-pubsub icon indicating copy to clipboard operation
nodejs-pubsub copied to clipboard

Failed to "ack" for 3000 message(s). Reason: 3 INVALID_ARGUMENT: Request payload size exceeds the limit: 524288 bytes

Open ablbol opened this issue 1 year ago • 5 comments

NOTE: I have already created Stackoverslow question for this issue and was asked by a Google Cloud Employee (Kamal Aboul-Hosn) to raise the issue over here.

Environment details

  • OS:
  • Node.js version: v18.16.1
  • npm version: 9.5.1
  • @google-cloud/pubsub version: ^3.7.1

Steps to reproduce

I am using Node JS client for Google PubSub (RPC StreamingPullRequest API) with flow control. My subscription options look like that:

 {
      streamingOptions: {
        // decrease this to reduce re-deliveries
        maxStreams: 2, // default: 5
      },
      flowControl: {
        allowExcessMessages: false,
        maxMessages: 4000, // No issues at 2000
      },
    },

I reduced maxStreams from 5 (default) to 2 to reduce re-deliveries, as mentioned here in the docs. When I set the maxMessages to 4000, I get the following debug warning, which causes re-deliveries of messages:

Pubsub subscription received debug message: Failed to "ack" for 3000 message(s). Reason: 3 INVALID_ARGUMENT: Request payload size exceeds the limit: 524288 bytes.

Why am I getting the warning, and how can I increase the payload size?

ablbol avatar Dec 20 '23 16:12 ablbol

@ablbol Thanks for the issue. I could've sworn we already had an issue about this, but either way:

The problem is that, as the message suggests, there is a maximum payload size for ack requests to the server. For a very high level of throughput or a very large batch size, it can overflow that size while trying to send back all of the ack IDs (especially with exactly once delivery enabled). Our intent here is basically to queue up multiple ack requests to the server if there are too many for a single request, but that hasn't been implemented yet.

feywind avatar Jan 31 '24 19:01 feywind

We ran into a similar issue when our downstream system had backpressure from our downstream system. The failed ack requests due to request size was leading to memory leak, eventually causing our workers to be OOM killed and restarted.

We were able to work around this by setting the BatchOptions.maxMessages parameter to be less than the default of 3000. We used 1000 and the requests then succeeded and we no longer leaked memory.

brianfranko avatar Feb 28 '24 21:02 brianfranko

im testing a fix by adding a maxBytes param, to make sure it doesnt pass the threshold

dermasmid avatar Mar 10 '24 12:03 dermasmid

@dermasmid Thanks for the PR. I think the max is always the same, and I'm not sure there's a case where you'd want to make it less than the server max? Let's run this by @kamalaboulhosn , we might be able to simplify further.

feywind avatar Mar 11 '24 19:03 feywind

Quick update here - what we want to do with this is adapt the PR to not take a max from the user, but just to use the server's max. I need to verify what the exact number is, before doing that.

feywind avatar Jun 13 '24 17:06 feywind

The 4.7.2 release should fix this, just released a moment ago.

feywind avatar Sep 13 '24 18:09 feywind