opentelemetry-android icon indicating copy to clipboard operation
opentelemetry-android copied to clipboard

feat(instrumentation): Add FirstDraw spans to activity instrumentation

Open Doohl opened this issue 2 months ago • 6 comments

fixes #1143

What is this change?

This PR adds instrumentation to track the initial draw time of Android activities. A new FirstDraw span is created as a child of the activity creation span (e.g., AppStart or Created), measuring the time from activity creation until the first frame is rendered on screen.

The implementation:

  • Hooks into Android's ViewTreeObserver.OnDrawListener to detect when the first draw occurs
  • Handles Android API < 26 bug where draw listeners weren't properly merged into the view tree observer
  • Captures screen complexity metrics (view node count and depth) when the draw completes

What are the points of contention?

1. Screen view complexity attributes

Two new attributes are added to the FirstDraw span to capture screen complexity:

  • screen.view.nodes (long): Total count of all View nodes in the view hierarchy
  • screen.view.depth (long): Maximum depth of the view hierarchy

We believe these metrics will add some value if users try to debug long FirstDraw spans. Feedback here would be appreciated, though!

2. FirstDraw value

FirstDraw spans aren't a perfect signal for when Activities become 'interactable'. Rather, this can help users identify issues with the UI pipeline. For that, users would have to manually instrument spans or events that fire off when the app or activity enters a state where the end user can actually start interacting with the app.

3. FirstDraw span lifecycle

FirstDraw spans are children of the Created spans. However, FirstDraw span can end long after the parent span has ended. This can create a confusing tracing experience, but we probably don't want to change the definition of the existing Created span.

Does it make sense to keep FirstDraw as a child of Created?

How was this tested?

Tested some scenarios with the demo app (API versions 28 and 30) to validate the correctness of the instrumentation. If you have any ideas how this could be better tested, feel free to opine!

image image

Raw trace: https://www.codebin.cc/code/cmg9yjmxb0001jz037mtuj2v0:HXhjtG6XV3Vhfh8Ewy6r1S6KfapJtmgM1S1beucxBmpG

Doohl avatar Oct 02 '25 21:10 Doohl

Codecov Report

:x: Patch coverage is 63.15789% with 28 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 64.26%. Comparing base (f466e65) to head (5360ece). :warning: Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...instrumentation/activity/draw/FirstDrawListener.kt 40.90% 13 Missing :warning:
...droid/instrumentation/activity/ActivityTracer.java 38.46% 8 Missing :warning:
...droid/instrumentation/activity/draw/WindowUtils.kt 80.95% 3 Missing and 1 partial :warning:
...strumentation/activity/Pre29ActivityCallbacks.java 33.33% 2 Missing :warning:
...roid/instrumentation/activity/ActivityCallbacks.kt 50.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1281      +/-   ##
==========================================
- Coverage   64.30%   64.26%   -0.04%     
==========================================
  Files         142      145       +3     
  Lines        3012     3087      +75     
  Branches      296      307      +11     
==========================================
+ Hits         1937     1984      +47     
- Misses        998     1025      +27     
- Partials       77       78       +1     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Oct 02 '25 22:10 codecov[bot]

The problem with detecting the first frame to be drawn is that, as you mentioned, it doesn't mean the Activity is ready to be consumed, it also doesn't necessarily mean the view tree is complete. If we are dealing with a Compose-based Activity, a minimal view tree will first be drawn and the rest will be filled in after the first frame is delivered.

This means that the page complexity attribute being logged may be inaccurate - or at least not represent what you think it does. (I think there are further issues with using that metric in the first place, but one thing at a time 😅 )

I think tracking this event offers good utility, but we should probably properly contextualize it, and perhaps not add in extra data that may be noisy or misleading.

bidetofevil avatar Oct 21 '25 14:10 bidetofevil

i think the user has to tell when the activity is ready to be used (vs fully drawn etc) maybe we could inspire or use https://developer.android.com/reference/kotlin/androidx/activity/FullyDrawnReporter#reportFullyDrawn()

marandaneto avatar Oct 21 '25 15:10 marandaneto

https://github.com/square/papa/tree/main also has a few things related to what we do here but its more about the app's launch and not per activity i think (worth checking)

marandaneto avatar Oct 21 '25 15:10 marandaneto

The problem with detecting the first frame to be drawn is that, as you mentioned, it doesn't mean the Activity is ready to be consumed, it also doesn't necessarily mean the view tree is complete.

Agree, but that isn't really the purpose of tracking this event. As it stands the best way to track when an Activity is ready to interact / consume (ie a "Time to interactive") would be to track the Activity reportFullyDrawn. That is a separate thing altogether.

This PR is concerned with the TimeToInitialDisplay vital.

If we are dealing with a Compose-based Activity, a minimal view tree will first be drawn and the rest will be filled in after the first frame is delivered.

Yeah InitialDraw doesn't work too well in the context of Compose Activities. Open to ideas.

This means that the page complexity attribute being logged may be inaccurate - or at least not represent what you think it does. (I think there are further issues with using that metric in the first place, but one thing at a time 😅 )

The complexity / depth attributes are removed! I think they aren't really needed for this new type of telemetry tbh

I think tracking this event offers good utility, but we should probably properly contextualize it, and perhaps not add in extra data that may be noisy or misleading.

Agreed. I've removed it.

i think the user has to tell when the activity is ready to be used (vs fully drawn etc) maybe we could inspire or use

Yeah, the dev has to instrument something that can broadcast when an app / activity is ready to interact with. But we can automatically instrument, at least, when the Activity is ready to draw or has begun drawing.

Doohl avatar Oct 21 '25 18:10 Doohl

Hi @Doohl

Again, thank you for taking the time to make this contribution and for your patience. I'm following up here based on this comment from the related semconv PR. I've covered a lot of details there that seem relevant to this work, and, in a nutshell, I'd like to try and see if this PR can help make progress in the semconv one. Please take a look at that comment for more details.

Before diving deep into the implementation details that you propose here, I'd like to make sure we're all on the same page in terms of what the expected outcome is that you'd like to achieve. I know that the idea behind the semconv PR is to define a platform-agnostic span, which I think would be great, but for practical purposes, I'd like to get a better understanding of what it means specifically for Android. So, I'd like to use a visualization that relies on Android-specific terms to see if that helps to get a better understanding overall:

Screenshot 2025-10-22 at 14 33 29

The image shows 3 possible scenarios (used to be 4, but then I realized that 3 is enough) that can be covered with a span. Which scenario do you think better covers the outcome that you'd expect from this implementation?

LikeTheSalad avatar Oct 22 '25 12:10 LikeTheSalad