go-cloud icon indicating copy to clipboard operation
go-cloud copied to clipboard

pubsub/awssnssqs: aws sqs expose receiver max batch

Open pxp928 opened this issue 1 year ago • 6 comments
trafficstars

This PR exposes the receiver max batch size as a URL parameter for AWS SQS via receivermaxbatch.

For example: awssqs://sqs.us-east-2.amazonaws.com/99999/my-queue?receivermaxbatch=5

Based on the recvBatcherOpts: https://github.com/google/go-cloud/blob/be1b4aee38955e1b8cd1c46f8f47fb6f9d820a9b/pubsub/awssnssqs/awssnssqs.go#L118-L123 and the limitations of SQS, any value above 10 would default back to 10.

pxp928 avatar Mar 31 '24 14:03 pxp928

Out of curiousity, why do you need to set this?

vangent avatar Apr 01 '24 17:04 vangent

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.14%. Comparing base (be1b4ae) to head (af19c89).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3412      +/-   ##
==========================================
+ Coverage   73.12%   73.14%   +0.01%     
==========================================
  Files         113      113              
  Lines       14864    14870       +6     
==========================================
+ Hits        10870    10876       +6     
  Misses       3219     3219              
  Partials      775      775              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Apr 01 '24 17:04 codecov[bot]

Hey @vangent, the use case is to allow for flexibility to change the max batch size if needed. For example, having a smaller batch size could result in us being able to scale out our process service so that there are fewer messages "in-flight" (SQS visibility timeout). So the process service can be scaled in or out as needed based on the configured batch size that each process service can handle.

pxp928 avatar Apr 01 '24 19:04 pxp928

Have you tried doing that without manually tuning it? The pubsub package will not always use the maximum batch size, it will tune it to try to keep a balanced throughput.

So, for example, if you only have 2 worker goroutines processing messages, it is unlikely that it will be fetching 10 messages at a time and letting them sit there for a long time (unless the processing time is very fast, in which you do want more messages to be queued so that the workers aren't idle).

Basically, the package does a lot to try to make it so that you don't have to manually tune this, that's one of the benefits. If you are manually tuning it because what I've described isn't working the way you want for some reason, that's one thing, but I don't want you to add complexity manaully tuning something that shouldn't need it.

vangent avatar Apr 01 '24 20:04 vangent

oh, interesting I did not see that in https://gocloud.dev/howto/pubsub/subscribe/. Is there any documentation around this behavior? Is there a way to know what the current batch size is set to and what it changes to via logs? Thank You for pointing this out.

pxp928 avatar Apr 01 '24 21:04 pxp928

I don't think it's well-documented, as it is not really part of the public interface, it's internal implementation detail.

You can see constants for the algorithm here: https://github.com/google/go-cloud/blob/master/pubsub/pubsub.go#L397

and the main code is here: https://github.com/google/go-cloud/blob/master/pubsub/pubsub.go#L462

No, the batch size isn't currently logged, but you can patch a local copy and add some logging.

vangent avatar Apr 01 '24 22:04 vangent

Thanks @vangent for the information!

pxp928 avatar Jun 02 '24 20:06 pxp928