ably-python icon indicating copy to clipboard operation
ably-python copied to clipboard

Documentation request - safe async usage

Open hsghori opened this issue 2 years ago • 10 comments

I'm trying to understand to what extent actual parallel usage of this client is safe / tested / supported.

For example, since I can't use bulk publish to send multiple messages in a single payload if those messages are from different channels, I've been looking into using python coroutines to reduce the amount of time I need to wait on the ably servers.

But it's not clear to me if this usage is actually safe from a state / resources perspective.

client = AblyRest(get_api_key())
with client:
    responses = asyncio.gather(*[
        client.channels.get(channel_name).publish(name, message) for channel_name, name, message in messages
    ])

Under the hood it looks like Channel.publish calls back to the shared ably.http client and that can actually set parameters like self.__host and self.__host_expires which feels fairly unsafe since different coroutines could be setting those parameters on the same client. But since channel.publish is an explicitly async method I'd expect it to be safe to parallelize.

So I'd love to understand to what extent async usage of this sdk is safe / how y'all would recommend we use the async capabilities of the sdk.

┆Issue is synchronized with this Jira Task by Unito

hsghori avatar Sep 18 '23 19:09 hsghori

➤ Automation for Jira commented:

The link to the corresponding Jira issue is https://ably.atlassian.net/browse/SDK-3856

sync-by-unito[bot] avatar Sep 18 '23 19:09 sync-by-unito[bot]

@hsghori I have notified the author of the of the library who wrote asyncio part, we will get back to you once available 👍

sacOO7 avatar Sep 18 '23 19:09 sacOO7

Hey @hsghori, thanks for reaching out!

I understand your concern with asynchronous operations changing internal state, however we have aimed to design the library so that it 'just works' without needing to worry too much about concurrency, so I don't have any specific advice on how to use the async methods.

In this particular case, when the internal http client encounters an error which necessitates the use of a fallback host, it will iterate through fallback hosts and if it finds a working host it will temporarily save that as a preference. I suppose there are cases where several requests fail at once and the fallback mechanism would occur for each failure with each overwriting the host preference when finished, but it's quite rare for this to happen and there are only 5 fallback hosts to try so, while it's not the perfect behaviour, I don't consider it to be unsafe or otherwise unacceptable.

If this behaviour is unacceptable for you, an alternative and more efficient way to publish multiple messages to multiple channels is to use our batch API. We don't have first class support for it in ably-python just yet, however you can still make use of the SDKs auth and retry mechanism by querying the api with the generic AblyRest.request method. If you do go down this route I'm happy to answer any questions or share an example if needed.

owenpearson avatar Sep 19 '23 15:09 owenpearson

@owenpearson thanks for the response! I would love an example if you have time. Also is there a limit to the total number of messages that can be sent via the bulk API at once? And is the 10s timeout sufficient for using the bulk API?

hsghori avatar Sep 19 '23 15:09 hsghori

Hey @hsghori, here's an example of using the batch API. Note that while this example publishes a single 'batch spec', you can also publish an array of batch specs in order to send different messages to a different collection of channels. The results for each batch spec contain a successCount, failureCount, and a results array. The recommended way to use these is to check if failureCount is nonzero, and if so, loop through the results array to find which channels have an error field.

There is no limit to the number of messages, however there is a limit of 100 channels and 2MiB request body per request. And yes, the 10s timeout is fine for this endpoint.

import asyncio
from ably import AblyRest

async def main():
    client = AblyRest(key="your_api_key")

    batch_spec = {
        "channels": ["channel1", "channel2", "channel3"],
        "messages": [{"data": "message1"}, {"data": "message2"}],
    }
    res = await client.request(
        "POST", "/messages", "3", None, batch_spec
    )

    # an array of BatchResults, one for each BatchSpec, each containing a 
    # successCount and a failureCount indicating the number of channels to 
    # which the batch of messages was published successfully
    print(res.items)

asyncio.run(main())

owenpearson avatar Sep 19 '23 16:09 owenpearson

~~To clarify, is the channel limit BatchSpec or per request in total. In the api docs I see that we can have a request like:~~

[
  {
    channels: ['channel1', 'channel2'],
    messages: {data: 'My message'}
  },
  {
    channels: 'channel3',
    messages: [
      {data: 'My message'},
      {name: 'An event', data: 'My event message contents'},
    ]
  }
]

~~Is that entire array limited to 100 channels or is each BatchSpec in the array limitted to 100 channels. If it's the latter is there a limit on the number of BatchSpecs that can be sent at once?~~

Edit: I see that's explained in the docs. It's 100 channels per BatchSpec and the entire request body must be <2MB.

hsghori avatar Sep 19 '23 17:09 hsghori

In the example you provided though:

    res = await client.request(
        "POST", "/messages", "3", None, batch_spec
    )

It looks like you're passing the batch_spec into the headers arg instead of the body.

hsghori avatar Sep 19 '23 17:09 hsghori

Also do y'all have a P50 for performance of the bulk endpoint (esp related to input size). One of my goals here is to ensure that customers aren't waiting too long on message publishing. My application supports a lot of bulk operations which send a sends one message per item being operated on. So I want to make sure that a bulk op that takes ~1-5s does not end up getting significantly slower on average.

hsghori avatar Sep 19 '23 18:09 hsghori

It looks like you're passing the batch_spec into the headers arg instead of the body.

The body arg comes before headers, see: https://github.com/ably/ably-python/blob/9b7c9512ae058f54b50a3112b1fa4595c9e24667/ably/rest/rest.py#L122-L123

As for the batch API performance guarantees, I'll need to check with another team so will let you know when I have something to share. If possible, it might be worth testing some large queries yourself to see how it performs with large payloads?

owenpearson avatar Sep 20 '23 15:09 owenpearson

I'm trying to understand to what extent actual parallel usage of this client is safe / tested / supported.

For example, since I can't use bulk publish to send multiple messages in a single payload if those messages are from different channels, I've been looking into using python coroutines to reduce the amount of time I need to wait on the ably servers.

But it's not clear to me if this usage is actually safe from a state / resources perspective.

client = AblyRest(get_api_key())
with client:
    responses = asyncio.gather(*[
        client.channels.get(channel_name).publish(name, message) for channel_name, name, message in messages
    ])

Under the hood it looks like Channel.publish calls back to the shared ably.http client and that can actually set parameters like self.__host and self.__host_expires which feels fairly unsafe since different coroutines could be setting those parameters on the same client. But since channel.publish is an explicitly async method I'd expect it to be safe to parallelize.

So I'd love to understand to what extent async usage of this sdk is safe / how y'all would recommend we use the async capabilities of the sdk.

@hsghori currently, I am in a position to ans. this question. ably-python doesn't use multithreading, so asyncio default single threaded eventloop won't cause any synchronization issues 👍 You can check https://www.pythontutorial.net/python-concurrency/python-event-loop/

sacOO7 avatar Sep 28 '23 13:09 sacOO7