opentelemetry-python
opentelemetry-python copied to clipboard
Support async/await syntax
Python started to support async/await syntax in version 3.7. We want to explore how to support this in OpenTelemetry Python. Things to be explored are:
- Do we provide this in the API package or API+SDK?
- How do we test? If test cases are written using the async/await syntax, they won't work in Python <= v3.6, we might need to collect test cases based on runtime version.
Do we provide this in the API package or API+SDK?
If we can solve the <3.6 problem I think it'd be great to include both sync and async versions of some methods.
What do we lose by using asyncio.coroutine
instead of async
/await
?
I haven't got time to explore this so I don't know the answer at this moment. Just list these questions so I won't forget about them :)
From my memory, __aenter__
and __aexit__
should be supported starting from Python 3.5, and it wouldn't do any harm (unless we want to enforce 100% code coverage) to have them for Python 3.4.
In OpenCensus we do have a minor issue when we lint examples, since the lint engine is running under Python 3.6, if we use async/await syntax we will get SyntaxError. Here in OpenTelemetry I think we shouldn't have the problem.
Closing this for now, re-open if it becomes an issue in the future.
The .NET SDK provides an async API. I think we should provide one too. Preferably with AnyIO to allow any event loop implementation to be used with this library.
Please consider re-opening since lacking an async API may impact the adoption of OpenTelemetry in the next Celery major versions.
@codeboten Ping ☝️
Currently we're only supporting python 3.6+ so the concerns around breaking tests should be resolved. @thedrow re-opening, would love some help implementing this if you have some bandwidth.
Currently we're only supporting python 3.6+ so the concerns around breaking tests should be resolved. @thedrow re-opening, would love some help implementing this if you have some bandwidth.
I'll have the bandwidth for it once I'll start using this library.
No news on this topic ?
Unfortunately not yet. I might have more time for it in the future.
Dumb question: What does it mean to support async/await exactly? Are we talking about instrumentations working with async/await functions (e.g, tornado request handlers), context propation working with async/await (if it doesn't already) or do we expect async/await versions of some API methods and why?
Dumb question: What does it mean to support async/await exactly? Are we talking about instrumentations working with async/await functions (e.g, tornado request handlers), context propation working with async/await (if it doesn't already) or do we expect async/await versions of some API methods and why?
I have the exact same question as well. I find it a bit weird that we are to add anything to our implementation without it being in the spec first. Is it there already?
I can understand that the idea is for an implementation to do some preliminary research of a certain feature, but if that is the case it would be very helpful to have also a preliminary specification (it can be drafted in this issue, I don't mean it to be in the spec repo) so that we have an agreement and understanding on what is being attempted here.
We want the IO operations required to report the traces, metrics, and logs to the server will be asynchronous since if they aren't and we use an event loop, they will be blocking. Since blocking the event loop might hurt performance, that will lead to users of this library being less observable.
OK. That clarifies it a bit more. So this does not apply to the API specifically. What we want are async versions of BatchSpanProcessor and probably core exporters. Is that right?
Yes, but since we need to await every call when one call in the call stack is asynchronous, we need an async variant of every operation which uses those.
It would be great if someone interested in this could take this up and produce a design doc or proposal with technical details describing all possible directions we can take with pros and cons weighed so we can move forward on this. Ideally it'd be great if we could support both sync and async use cases with same methods but not sure how easily that can be done or even if possible.
Another aspect of this: Each asyncio task needs to have its own root context span. Currently when you create a span, it assumes that if there's a current span active, that this should be the span to have as the parent span. In asyncio, this is nearly always wrong.
I started the effort in #2412.
OK. That clarifies it a bit more. So this does not apply to the API specifically. What we want are async versions of BatchSpanProcessor and probably core exporters. Is that right?
I've been thinking about this a bit for metrics as well. I think we would want to be able to create Observable instruments with async callbacks too, which requires some API change. This could probably be done by only updating the Meter API's type annotations to accept async or regular forms, and then updating the SDK to handle these.
Currently when you create a span, it assumes that if there's a current span active, that this should be the span to have as the parent span. In asyncio, this is nearly always wrong.
@Alan-R can you explain this a bit more? I thought our context implementation which uses contextvars should work correctly for asyncio in most cases.
contextvars should work correctly regardless of the eventloop used. It'll work with trio as well.
On Wed, Jan 26, 2022, at 3:11 PM, Aaron Abbott wrote:
OK. That clarifies it a bit more. So this does not apply to the API specifically. What we want are async versions of BatchSpanProcessor and probably core exporters. Is that right?
I've been thinking about this a bit for metrics as well. I think we would want to be able to create Observable instruments with async callbacks too, which requires some API change. This could probably be done by only updating the Meter API's type annotations to accept async or regular forms, and then updating the SDK to handle these.
Currently when you create a span, it assumes that if there's a current span active, that this should be the span to have as the parent span. In asyncio, this is nearly always wrong.
@Alan-R https://github.com/Alan-R can you explain this a bit more? I thought our context implementation which uses contextvars should work correctly for asyncio in most casesMessage ID: @.***>
I was mistaken. That's how it was for OpenTracing. I mistakenly thought it was the same for OpenTelemetry.
My apologies.
-- Alan Robertson @.***
Any news about this? it would also be nice to have async tracer as well so we can do something like this
async with tracer.start_span_async("span"):
pass
@gen-xu What do you have in mind for start_span_async
to actually do? I don't think start_span
does anything that would need to have an async version (although I guess a SpanProcessor
could change that, but an AsyncSpanProcessor
would be needed to address that, with much wider repercussions).
@TBBle yes if we have AsyncSpanProcessor
then we might need start_span_async
, but there are some other places I see a need with start_span_async
:
opentelemetry.sdk.trace.Span
has a _lock
attribute, while in the world of asyncio
you don't really need a threading.Lock
. Also comparing with asyncio.Lock
, it is much lightweight and faster than threading.Lock
.
The current implementation of tracer.start_as_current_span
is a bit expensive blocking call, and frequently starting spans introduce overhead that hurts performance of asyncio event loop. The __enter__
and __exit__
of the Span
on their own already take ~250us on my machine.
The
__enter__
and__exit__
of theSpan
on their own already take ~250us on my machine.
Thanks for this number. Can you provide a bit more info about the machine and setup? The benchmark tests suggest it should be 50us at most (20k iterations/sec). Or maybe I misunderstood how those benchmarks are supposed to be interpreted.
The things protected by Span._lock
(AFAICT) would not make sense to protect with an asyncio.Lock
because there's nothing that would await
, so there's no contention possible. So what you're actually looking for there is a "SingleThreadedSpan" (or a single_threaded
parameter which could just make self._lock
be a no-op context manager), which would not need async with
.
The current implementation of
tracer.start_as_current_span
is a bit expensive blocking call, and frequently starting spans introduce overhead that hurts performance of asyncio event loop. The__enter__
and__exit__
of theSpan
on their own already take ~250us on my machine.
If you're using BatchSpanProcessor and one of the built-in samplers, this should be CPU bound work + enqueing the message in a queue.Queue
. Unless you have a lot of threads concurrently creating spans, I don't think non-blocking await would help here, as it's CPU bound.
On my machine:
$ python3 -m timeit -s "from threading import Lock; l = Lock()" "with l: pass"
2000000 loops, best of 5: 163 nsec per loop
acquiring the locks shouldn't be too expensive if there isn't lock contention. Since you mention asyncio, I'm assuming you don't have many threads.
I'm trying to think of what async work might be done if we introduced a start_span_async()
. One use case I can think of is remote sampling
@gshpychka Thanks for the link, didn't know there was a benchmark page available. I think the numbers I gave were not accurate, which are based on the viztracer library I used. I think it's just the viztracer's overhead and I re-did again myself with simply time() and it gives numbers like 50~60 us. Sorry for the false positive.
I'm not sure if it should be a separate ticket, but working through how to deliver observable instruments using an async co-routine might be interesting.
I haven't really done this thought experiment myself, and so don't know if it's even feasible, I only raise it because a colleague was looking into adding an observable guage to our asyncio-based code, and had to use asyncio.run_coroutine_threadsafe
to block the exporter thread to be able to call back into our event loop, so perhaps just supporting that as a third callback type (where we already have functions and generator) is the best we can do?
Looking at the anyio equivalent, it's possible that there isn't a good solution, as the callback needs too much event loop info or pre-work, so doing what my colleage has currently done in user-code may be our best option.
def message_count_callback():
coro = message_source.consumer_count()
future = asyncio.run_coroutine_threadsafe(coro, loop)
message_count = future.result()
yield Observation(message_count)
and had to use
asyncio.run_coroutine_threadsafe
to block the exporter thread to be able to call back into our event loop, so perhaps just supporting that as a third callback type (where we already have functions and generator) is the best we can do?
This is basically what I had in mind. We can accept an optional event loop in the MeterProvider (if not provided create a background thread with a loop running for all of OpenTelemetry to use) and run any coroutine callbacks in that loop. The benefit of doing it in the SDK vs user code is that we can run the async callbacks for all instruments in parallel.
Looking at the anyio equivalent, it's possible that there isn't a good solution, as the callback needs too much event loop info or pre-work, so doing what my colleage has currently done in user-code may be our best option.
I'm not super familiar with anyio, how important is it to everyone that we use anyio instead of just using asyncio here? We are usually pretty conservative about taking new dependencies
At least for anyio, I thought it was already being used in the SDK for something, but I just looked and it's not; I must have been thinking of a different project, sorry. I see we're directly using asyncio in the tests and docs, but the SDK code itself has no async
at all.
I will note that it's been a pretty-common sight in other libraries I've used that have to deal with async but aren't asyncio-specific, but I haven't really looked into it, or had to deal with it directly myself. I have no idea how widespread non-asyncio event loops, e.g. trio, are in practice, so I can't really comment on the cost/value tradeoff of using it.