nowinandroid icon indicating copy to clipboard operation
nowinandroid copied to clipboard

[Bug]: navigationBar_multipleBackStackInterests might be flaky

Open JoseAlcerreca opened this issue 2 years ago • 9 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Is there a StackOverflow question about this issue?

  • [X] I have searched StackOverflow

What happened?

navigationBar_multipleBackStackInterests might be flaky. It tends to fail in API 30.

Example failure: https://github.com/android/nowinandroid/actions/runs/6774060182/job/18410826425

Related: #1011

Relevant logcat output

> Task :app:connectedDemoDebugAndroidTest
Starting 26 tests on test(AVD) - 11

> Task :core:database:connectedDemoDebugAndroidTest

> Task :app:connectedDemoDebugAndroidTest

com.google.samples.apps.nowinandroid.ui.NavigationTest > navigationBar_multipleBackStackInterests[test(AVD) - 11] FAILED 
	kotlinx.coroutines.test.UncompletedCoroutinesError: After waiting for 10s, the test coroutine is not completing
	at kotlinx.coroutines.test.TestBuildersKt__TestBuildersKt$runTest$2$1$2$1.invoke(TestBuilders.kt:349)

[DDMLIB]: An unexpected packet was received before the handshake.

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

JoseAlcerreca avatar Nov 07 '23 08:11 JoseAlcerreca

I'm not able to reproduce this flakiness on an emulator with the same API level. Do you have access to the test reports (to pinpoint the actual error)? It seems like uploaded artifacts are removed when a workflow is re-run.

SimonMarquis avatar Nov 07 '23 19:11 SimonMarquis

I tried a quick fix here because it was making working on that PR very difficult.

JoseAlcerreca avatar Nov 08 '23 14:11 JoseAlcerreca

Happened again here: https://github.com/android/nowinandroid/actions/runs/6814093754/job/18531717197#step:7:1092

dturner avatar Nov 09 '23 17:11 dturner

@JoseAlcerreca Can you/I extract your fix in a dedicated PR and try to merge it and see if it fixes the issue?

SimonMarquis avatar Nov 09 '23 17:11 SimonMarquis

Merged in #1019

JoseAlcerreca avatar Nov 10 '23 14:11 JoseAlcerreca

I did some investigation into this today. The flaky test navigates to 4 screens and performs a scroll. It takes ~10s to run on CI. runTest's default timeout is 10s so in cases where it took >10s it would fail.

The reason why it doesn't fail locally is because your local machine is likely much more powerful than the CI machine so the emulator responds faster. On my M1 this test took about 7 seconds.

The reason for the 10s timeout is given by Jake Wharton here: https://github.com/Kotlin/kotlinx.coroutines/issues/3270 with a big follow up discussion about the issues it causes here: https://github.com/Kotlin/kotlinx.coroutines/issues/3800.

dturner avatar Nov 11 '23 00:11 dturner

@JoseAlcerreca's fix works because it moves the test execution out of the runTest block but I'd like us to consider all the possible options.

10 seconds is a really long time for a test so this should be treated as an anomaly. Possible solutions:

  1. Avoid using runTest in any instrumented tests (not keen on this as it increases the overall test timeout to 60 seconds which is far too long)
  2. Override the default timeout value for runTest on a per-test basis, maybe with a constant used to easily identify these long running tests. If left up to the test author we should provide some guidance on what's a reasonable timeout e.g. for each new screen you're navigating to, allow 3 seconds
  3. Something clever to do with annotations, like @LargeTest

Any other ideas or thoughts?

dturner avatar Nov 11 '23 00:11 dturner

Good catch!

  1. Avoid using runTest in any instrumented tests (not keen on this as it increases the overall test timeout to 60 seconds which is far too long)

This will indeed restore the default behavior of whatever the code does, which for instrumented tests is generally driven by espresso "idling" mechanism and uses a 60s timeout. For Compose API, there should probably be the same, but I don't find any reference. But there are no "overall" 60s timeout as you say baked into the Android test framework. If the test takes 10min (idle or not), the test will continue until finished.

  1. Override the default timeout value for runTest on a per-test basis, maybe with a constant used to easily identify these long running tests. If left up to the test author we should provide some guidance on what's a reasonable timeout e.g. for each new screen you're navigating to, allow 3 seconds

This might not really improve the situation, if it is left to choose by the dev, because most of the devs will have a more performant machine than what GitHub Actions offers for free. And everything would look fine on their end, they probably won't add a custom timeout, but then it will continue to fail sometimes on CI.

  1. Something clever to do with annotations, like @LargeTest

This would be very neat, but I think these are only meant for test selection/groupping with the JUnit framework. Maybe a custom test runner could help with detecting that, but this would be way more involving.


My personal opinion on this would be to:

  • either provide a kind of runNavigationTest(timeout: Duration) (similar to your 2.) with a mandatory/explicit timeout parameter that would ensure the test does not reach a maximum value. The only caveats to this is, how to enforce it? (Lint?)
  • or not use runTest() at all. And instead use runBlocking() calls where we need to invoke suspending functions. In these scenarios, we generally don't care about the delay-skipping feature anyway (reading a local repository).

SimonMarquis avatar Nov 11 '23 08:11 SimonMarquis

Good points. runBlocking seems like the best solution here - easy to understand and no magic delay skipping under the hood.

dturner avatar Nov 13 '23 13:11 dturner