botocore Urlencoding in SQS SendMessage is extremely expensive in CPU

Urlencoding in SQS SendMessage is extremely expensive in CPU

Open yogevyuval opened this issue 2 years ago • 5 comments

Describe the bug

Using botocore to send messages to SQS can be very expensive in CPU, because of urlencoding the message body.

A. Urlencode is called twice, as stated in multiple issues in botocore, if I understand correctly once in the signature, and once in the preparation of the body of the request. B. The Urlencode implementation is extremely slow, so programs that are working with high throughput data can spend a lot of their time just urlencoding, which can be even more time consuming than the business logic itself.

We are using botocore in asyncio, with aiobotocore, and urlencode is blocking the eventloop making it practically unusable in high throuhgputs.

Steps to reproduce Send large messages with urlencode to an SQS queue.

Expected behavior Encoding shouldnt be the most expensive step of sending messages.

Profiling

Nov 04 '21 07:11 yogevyuval

Hi @yogevyuval, thanks for providing your feedback. We don’t support aiobotocore but I follow the reasoning behind your request.

You mentioned other botocore issues had brought up urlencode. Can you tell us which issues you were looking at?

Nov 04 '21 19:11 tim-finnigan

Hi @yogevyuval, thanks for providing your feedback. We don’t support aiobotocore but I follow the reasoning behind your request.

You mentioned other botocore issues had brought up urlencode. Can you tell us which issues you were looking at?

As can be seen in https://github.com/boto/botocore/pull/1566

Urlencode is now called twice instead of many times. But if it could have been called once that would save half of the cpu time spent there

Nov 04 '21 19:11 yogevyuval

Thanks @yogevyuval I think that is a reasonable feature request and we can keep this issue open to track it.

Nov 04 '21 22:11 tim-finnigan

Thanks @yogevyuval I think that is a reasonable feature request and we can keep this issue open to track it.

@tim-finnigan An update:

We patched AWSRequestPreparer._prepare_body with a faster rust-based implementation of url quoting (https://pypi.org/project/urlquote/), and experienced a 3X performance boost.

def patch_aws_request_urllib_parse():
    def _fast_quote(value, *args, **kwargs) -> str:
        return fast_quote(value, quoting=PYTHON_3_7_QUOTING).decode("utf-8")

    def _fast_prepare_body(self, original):
        """Prepares the given HTTP body data."""
        body = original.data
        if body == b"":
            body = None

        if isinstance(body, dict):
            params = [self._to_utf8(item) for item in body.items()]
            body = urlencode(params, doseq=True, quote_via=_fast_quote)

        return body

    AWSRequestPreparer._prepare_body = _fast_prepare_body

Nov 07 '21 13:11 yogevyuval

Just chiming in to say that I've somewhat verified this with the following:

import io
import cProfile
from botocore.session import Session
from botocore.awsrequest import AWSResponse


class MockResponse(io.BytesIO):
    def stream(self, *args, **kwargs):
        yield self.read()


def stub(**kwargs):
    raw_body = MockResponse(
        b'<?xml version="1.0"?>'
        b'<SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2012-11-05/">'
        b'<SendMessageResult><MessageId>eb8f0682-118a-4e63-b0b7-68337d38d962</MessageId>'
        b'<MD5OfMessageBody>food18db4cc2f85cedef654fccc4a4d8</MD5OfMessageBody>'
        b'</SendMessageResult>'
        b'</SendMessageResponse>'
    )
    return AWSResponse('https://example.com', 200, {}, raw_body)


ses = Session()
client = ses.create_client('sqs')
client.meta.events.register('before-send', stub)


payload = 'a' * (1024 * 240)
with cProfile.Profile() as pr:
    for _ in range(100):
        r = client.send_message(
            QueueUrl='...',
            MessageBody=payload,
        )

pr.dump_stats('t.prof')

This script removes networking as a factor where for large SQS message payloads (close to the max of 250 Kb) urlencode takes about ~80% (~4% with networking for me) of the runtime.

The previous PR is still largely correct: to remove the duplicate preparation calls we'd need to do some refactoring around request "preparation" (a legacy concept from when were built on requests). It's worth noting this only applies to query services, so one possible solution is to remove the notion of a dict body entirely and instead have the serializer directly handle the conversion and produce a bytes body, which would reduce the cost of calling prepare and remove the duplicate urlencode call. This would require that no logic post serialization is relying on the body being a dictionary for mutability purposes (I doubt this is the case, though).

Another possible solution is to get a little creative with caching request preparation but might be a little tricky and require some care to ensure we're not leaking anything (memory or between instances of prepared requests).

Nov 08 '21 22:11 joguSD

@jonemo @nateprewitt @tim-finnigan It seems that the latest announcement regarding JSON support will fix this issue which is great- any news on getting that into botocore and boto3?

Jul 31 '23 12:07 yogevyuval

Patched boto3 to use https://github.com/blue-yonder/urlquote, reduces cpu sigfinicantly

Mar 08 '24 18:03 yogevyuval

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

Mar 08 '24 18:03 github-actions[bot]

botocore botocore copied to clipboard

Urlencoding in SQS SendMessage is extremely expensive in CPU

botocore
botocore copied to clipboard