opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

Support async/await syntax

Open reyang opened this issue 4 years ago • 30 comments

Python started to support async/await syntax in version 3.7. We want to explore how to support this in OpenTelemetry Python. Things to be explored are:

  1. Do we provide this in the API package or API+SDK?
  2. How do we test? If test cases are written using the async/await syntax, they won't work in Python <= v3.6, we might need to collect test cases based on runtime version.

reyang avatar Jul 25 '19 15:07 reyang

Do we provide this in the API package or API+SDK?

If we can solve the <3.6 problem I think it'd be great to include both sync and async versions of some methods.

What do we lose by using asyncio.coroutine instead of async/await?

c24t avatar Jul 25 '19 21:07 c24t

I haven't got time to explore this so I don't know the answer at this moment. Just list these questions so I won't forget about them :)

From my memory, __aenter__ and __aexit__ should be supported starting from Python 3.5, and it wouldn't do any harm (unless we want to enforce 100% code coverage) to have them for Python 3.4. In OpenCensus we do have a minor issue when we lint examples, since the lint engine is running under Python 3.6, if we use async/await syntax we will get SyntaxError. Here in OpenTelemetry I think we shouldn't have the problem.

reyang avatar Jul 26 '19 00:07 reyang

Closing this for now, re-open if it becomes an issue in the future.

codeboten avatar Jul 22 '20 20:07 codeboten

The .NET SDK provides an async API. I think we should provide one too. Preferably with AnyIO to allow any event loop implementation to be used with this library.

Please consider re-opening since lacking an async API may impact the adoption of OpenTelemetry in the next Celery major versions.

thedrow avatar Oct 05 '20 12:10 thedrow

@codeboten Ping ☝️

thedrow avatar May 09 '21 06:05 thedrow

Currently we're only supporting python 3.6+ so the concerns around breaking tests should be resolved. @thedrow re-opening, would love some help implementing this if you have some bandwidth.

codeboten avatar May 10 '21 18:05 codeboten

Currently we're only supporting python 3.6+ so the concerns around breaking tests should be resolved. @thedrow re-opening, would love some help implementing this if you have some bandwidth.

I'll have the bandwidth for it once I'll start using this library.

thedrow avatar May 10 '21 18:05 thedrow

No news on this topic ?

trollkarlen avatar Oct 14 '21 08:10 trollkarlen

Unfortunately not yet. I might have more time for it in the future.

thedrow avatar Oct 14 '21 12:10 thedrow

Dumb question: What does it mean to support async/await exactly? Are we talking about instrumentations working with async/await functions (e.g, tornado request handlers), context propation working with async/await (if it doesn't already) or do we expect async/await versions of some API methods and why?

owais avatar Oct 14 '21 12:10 owais

Dumb question: What does it mean to support async/await exactly? Are we talking about instrumentations working with async/await functions (e.g, tornado request handlers), context propation working with async/await (if it doesn't already) or do we expect async/await versions of some API methods and why?

I have the exact same question as well. I find it a bit weird that we are to add anything to our implementation without it being in the spec first. Is it there already?

I can understand that the idea is for an implementation to do some preliminary research of a certain feature, but if that is the case it would be very helpful to have also a preliminary specification (it can be drafted in this issue, I don't mean it to be in the spec repo) so that we have an agreement and understanding on what is being attempted here.

ocelotl avatar Oct 14 '21 12:10 ocelotl

We want the IO operations required to report the traces, metrics, and logs to the server will be asynchronous since if they aren't and we use an event loop, they will be blocking. Since blocking the event loop might hurt performance, that will lead to users of this library being less observable.

thedrow avatar Oct 17 '21 14:10 thedrow

OK. That clarifies it a bit more. So this does not apply to the API specifically. What we want are async versions of BatchSpanProcessor and probably core exporters. Is that right?

owais avatar Oct 17 '21 17:10 owais

Yes, but since we need to await every call when one call in the call stack is asynchronous, we need an async variant of every operation which uses those.

thedrow avatar Oct 18 '21 08:10 thedrow

It would be great if someone interested in this could take this up and produce a design doc or proposal with technical details describing all possible directions we can take with pros and cons weighed so we can move forward on this. Ideally it'd be great if we could support both sync and async use cases with same methods but not sure how easily that can be done or even if possible.

owais avatar Oct 18 '21 08:10 owais

Another aspect of this: Each asyncio task needs to have its own root context span. Currently when you create a span, it assumes that if there's a current span active, that this should be the span to have as the parent span. In asyncio, this is nearly always wrong.

Alan-R avatar Oct 25 '21 13:10 Alan-R

I started the effort in #2412.

thedrow avatar Jan 26 '22 14:01 thedrow

OK. That clarifies it a bit more. So this does not apply to the API specifically. What we want are async versions of BatchSpanProcessor and probably core exporters. Is that right?

I've been thinking about this a bit for metrics as well. I think we would want to be able to create Observable instruments with async callbacks too, which requires some API change. This could probably be done by only updating the Meter API's type annotations to accept async or regular forms, and then updating the SDK to handle these.

Currently when you create a span, it assumes that if there's a current span active, that this should be the span to have as the parent span. In asyncio, this is nearly always wrong.

@Alan-R can you explain this a bit more? I thought our context implementation which uses contextvars should work correctly for asyncio in most cases.

aabmass avatar Jan 26 '22 22:01 aabmass

contextvars should work correctly regardless of the eventloop used. It'll work with trio as well.

thedrow avatar Apr 10 '22 12:04 thedrow

On Wed, Jan 26, 2022, at 3:11 PM, Aaron Abbott wrote:

OK. That clarifies it a bit more. So this does not apply to the API specifically. What we want are async versions of BatchSpanProcessor and probably core exporters. Is that right?

I've been thinking about this a bit for metrics as well. I think we would want to be able to create Observable instruments with async callbacks too, which requires some API change. This could probably be done by only updating the Meter API's type annotations to accept async or regular forms, and then updating the SDK to handle these.

Currently when you create a span, it assumes that if there's a current span active, that this should be the span to have as the parent span. In asyncio, this is nearly always wrong.

@Alan-R https://github.com/Alan-R can you explain this a bit more? I thought our context implementation which uses contextvars should work correctly for asyncio in most casesMessage ID: @.***>

I was mistaken. That's how it was for OpenTracing. I mistakenly thought it was the same for OpenTelemetry.

My apologies.

-- Alan Robertson @.***

Alan-R avatar Apr 10 '22 14:04 Alan-R

Any news about this? it would also be nice to have async tracer as well so we can do something like this

async with tracer.start_span_async("span"):
   pass

gen-xu avatar May 22 '22 10:05 gen-xu

@gen-xu What do you have in mind for start_span_async to actually do? I don't think start_span does anything that would need to have an async version (although I guess a SpanProcessor could change that, but an AsyncSpanProcessor would be needed to address that, with much wider repercussions).

TBBle avatar Jun 08 '22 19:06 TBBle

@TBBle yes if we have AsyncSpanProcessor then we might need start_span_async, but there are some other places I see a need with start_span_async:

opentelemetry.sdk.trace.Span has a _lock attribute, while in the world of asyncio you don't really need a threading.Lock. Also comparing with asyncio.Lock, it is much lightweight and faster than threading.Lock.

The current implementation of tracer.start_as_current_span is a bit expensive blocking call, and frequently starting spans introduce overhead that hurts performance of asyncio event loop. The __enter__ and __exit__ of the Span on their own already take ~250us on my machine.

gen-xu avatar Jun 09 '22 09:06 gen-xu

The __enter__ and __exit__ of the Span on their own already take ~250us on my machine.

Thanks for this number. Can you provide a bit more info about the machine and setup? The benchmark tests suggest it should be 50us at most (20k iterations/sec). Or maybe I misunderstood how those benchmarks are supposed to be interpreted.

gshpychka avatar Jun 10 '22 13:06 gshpychka

The things protected by Span._lock (AFAICT) would not make sense to protect with an asyncio.Lock because there's nothing that would await, so there's no contention possible. So what you're actually looking for there is a "SingleThreadedSpan" (or a single_threaded parameter which could just make self._lock be a no-op context manager), which would not need async with.

TBBle avatar Jun 10 '22 17:06 TBBle

The current implementation of tracer.start_as_current_span is a bit expensive blocking call, and frequently starting spans introduce overhead that hurts performance of asyncio event loop. The __enter__ and __exit__ of the Span on their own already take ~250us on my machine.

If you're using BatchSpanProcessor and one of the built-in samplers, this should be CPU bound work + enqueing the message in a queue.Queue. Unless you have a lot of threads concurrently creating spans, I don't think non-blocking await would help here, as it's CPU bound.

On my machine:

$ python3 -m timeit -s "from threading import Lock; l = Lock()" "with l: pass"
2000000 loops, best of 5: 163 nsec per loop

acquiring the locks shouldn't be too expensive if there isn't lock contention. Since you mention asyncio, I'm assuming you don't have many threads.

I'm trying to think of what async work might be done if we introduced a start_span_async(). One use case I can think of is remote sampling

aabmass avatar Jun 10 '22 18:06 aabmass

@gshpychka Thanks for the link, didn't know there was a benchmark page available. I think the numbers I gave were not accurate, which are based on the viztracer library I used. I think it's just the viztracer's overhead and I re-did again myself with simply time() and it gives numbers like 50~60 us. Sorry for the false positive.

gen-xu avatar Jun 12 '22 03:06 gen-xu

I'm not sure if it should be a separate ticket, but working through how to deliver observable instruments using an async co-routine might be interesting.

I haven't really done this thought experiment myself, and so don't know if it's even feasible, I only raise it because a colleague was looking into adding an observable guage to our asyncio-based code, and had to use asyncio.run_coroutine_threadsafe to block the exporter thread to be able to call back into our event loop, so perhaps just supporting that as a third callback type (where we already have functions and generator) is the best we can do?

Looking at the anyio equivalent, it's possible that there isn't a good solution, as the callback needs too much event loop info or pre-work, so doing what my colleage has currently done in user-code may be our best option.

    def message_count_callback():        
        coro = message_source.consumer_count()
        future = asyncio.run_coroutine_threadsafe(coro, loop)        
        message_count = future.result()        
        yield Observation(message_count)

TBBle avatar Jun 18 '22 08:06 TBBle

and had to use asyncio.run_coroutine_threadsafe to block the exporter thread to be able to call back into our event loop, so perhaps just supporting that as a third callback type (where we already have functions and generator) is the best we can do?

This is basically what I had in mind. We can accept an optional event loop in the MeterProvider (if not provided create a background thread with a loop running for all of OpenTelemetry to use) and run any coroutine callbacks in that loop. The benefit of doing it in the SDK vs user code is that we can run the async callbacks for all instruments in parallel.

Looking at the anyio equivalent, it's possible that there isn't a good solution, as the callback needs too much event loop info or pre-work, so doing what my colleage has currently done in user-code may be our best option.

I'm not super familiar with anyio, how important is it to everyone that we use anyio instead of just using asyncio here? We are usually pretty conservative about taking new dependencies

aabmass avatar Jun 21 '22 22:06 aabmass

At least for anyio, I thought it was already being used in the SDK for something, but I just looked and it's not; I must have been thinking of a different project, sorry. I see we're directly using asyncio in the tests and docs, but the SDK code itself has no async at all.

I will note that it's been a pretty-common sight in other libraries I've used that have to deal with async but aren't asyncio-specific, but I haven't really looked into it, or had to deal with it directly myself. I have no idea how widespread non-asyncio event loops, e.g. trio, are in practice, so I can't really comment on the cost/value tradeoff of using it.

TBBle avatar Jun 22 '22 06:06 TBBle