flutter Add `et test --output=...`

Today, et test just runs the underlying test executable, and reports a binary pass (exit code 0) or fail (otherwise).

A failed test also has a link to a log file on disk in a /tmp directory.

This is a good initial start, but is not yet at the level where we can encourage broad use (or use it in on CI).

Let's add 2 modes initially:

errors (default, write failure logs to stderr, and full logs to a file in FLUTTER_TEST_DIR or /tmp locally)
streamed (write all logs to stderr only)

As-per discussions, we'll bypass doing any sort of machine parsing of test output at this time - this will let us more quickly ramp up moving test invocations from run_tests.py (https://github.com/flutter/flutter/issues/156243) to et test, and add additional test hosts (iOS, Android, etc).

See also: https://bazel.build/reference/command-line-reference#flag--test_output for future suggestions.

Oct 18 '24 19:10 matanlurey

It is possible we'd want to generalize this to et test, et format and et lint.

Oct 18 '24 19:10 matanlurey

@johnmccutchan and @zanderso Can I get your thoughts on this one?

One blocker right now from (gradually) moving functionality from run_tests.py (such as test discoverability or configuration, see https://github.com/flutter/flutter/issues/156243), is that run_tests.py provides full output of a run, and et test has a different strategy.

Observe (@ HEAD at time of writing):

./testing/run_tests.py --variant host_debug_unopt_arm64 --type dart

Running command "/Users/matanl/Developer/engine/src/out/host_debug_unopt_arm64/flutter_tester --force-multithreading --disable-observatory --no-enable-impeller --use-test-fonts --icu-data-file-path=/Users/matanl/Developer/engine/src/out/host_debug_unopt_arm64/icudtl.dat --flutter-assets-dir=/Users/matanl/Developer/engine/src/out/host_debug_unopt_arm64/gen/flutter/lib/ui/assets --disable-asset-fonts /Users/matanl/Developer/engine/src/out/host_debug_unopt_arm64/gen/codec_test.dart.dill" in "/Users/matanl/Developer/engine/src"
00:00 +9: toImageSync - succeeds

00:00 +34: Paint, when copied, does not mutate the original instance

00:00 +35: Paint, when copied, the original changing does not mutate the copy

00:00 +10: toImageSync - toByteData

00:00 +36: DrawAtlas correctly copies color values into display list format

00:00 +11: toImage and toImageSync have identical contents

00:00 +37: DrawAtlas with no colors does not crash

00:00 +1: Simple .toImage

00:00 +38: Rendering ops with ImageFilter blur with default tile mode

Compared to (as an example) et test:

et test //flutter/tools/engine_tool/...
[2024-11-12T12:19:03.136][macos/host_debug: GN]: OK
[2024-11-12T12:19:04.224][macos/host_debug: RBE startup]: Proxy started successfully.
[2024-11-12T12:19:04.660][macos/host_debug: ninja]: 100.0% (2/2) STAMP obj/flutter/tools/engine_tool/cleanup_command_test.stamp
[2024-11-12T12:19:04.661][macos/host_debug: ninja]: OK
[2024-11-12T12:19:05.183][macos/host_debug: RBE shutdown]: Actions completed: 0
FAIL:        1s.205ms //flutter/tools/engine_tool:build_plan_test [details in /var/folders/qw/qw_3qd1x4kz5w975jhdq4k58007b7h/T/et52682IHNfZ8/process_artifacts.json]
OKAY:        1s.627ms //flutter/tools/engine_tool:cleanup_command_test
OKAY:        1s.727ms //flutter/tools/engine_tool:build_command_test
OKAY:        2s.312ms //flutter/tools/engine_tool:entry_point_test
OKAY:        1s.459ms //flutter/tools/engine_tool:fetch_command_test
OKAY:        1s.442ms //flutter/tools/engine_tool:flutter_tools_test
OKAY:        1s.429ms //flutter/tools/engine_tool:format_command_test
OKAY:        1s.431ms //flutter/tools/engine_tool:gn_test
OKAY:        1s.371ms //flutter/tools/engine_tool:label_test
OKAY:        1s.392ms //flutter/tools/engine_tool:lint_command_test
OKAY:        1s.392ms //flutter/tools/engine_tool:logger_test
OKAY:        1s.347ms //flutter/tools/engine_tool:phone_home_test
OKAY:        1s.449ms //flutter/tools/engine_tool:proc_utils_test
OKAY:        1s.482ms //flutter/tools/engine_tool:query_command_test
OKAY:        1s.545ms //flutter/tools/engine_tool:run_command_test
OKAY:        1s.474ms //flutter/tools/engine_tool:run_target_test
OKAY:        1s.504ms //flutter/tools/engine_tool:test_command_test
OKAY:        1s.431ms //flutter/tools/engine_tool:typed_json_test
OKAY:        1s.394ms //flutter/tools/engine_tool:utils_test
OKAY:        1s.307ms //flutter/tools/engine_tool:worker_pool_test

Note there not details on what tests were run or their status. The details JSON looks like this:

(This fails intentionally as an example)

{
  "pid": 52682,
  "exitCode": 1,
  "stdout": "\r00:00 \u001b[32m+0\u001b[0m: \u001b[1m\u001b[90mloading test/commands/build_plan_test.dart\u001b[0m\u001b[0m                                                                                                                                                   \r00:00 \u001b[32m+0\u001b[0m\u001b[31m -1\u001b[0m: \u001b[1m\u001b[90mloading test/commands/build_plan_test.dart\u001b[0m \u001b[1m\u001b[31m[E]\u001b[0m\u001b[0m                                                                                                                                            \n  \u001b[31mFailed to load \"test/commands/build_plan_test.dart\":\u001b[0m Does not exist.\n\n\u001b[1m\u001b[36mTo run this test again:\u001b[0m /Users/matanl/Developer/engine/src/flutter/prebuilts/macos-arm64/dart-sdk/bin/dart test test/commands/build_plan_test.dart -p vm --plain-name 'loading test/commands/build_plan_test.dart'\n\r00:00 \u001b[32m+0\u001b[0m\u001b[31m -1\u001b[0m: \u001b[31mSome tests failed.\u001b[0m                                                                                                                                                                        \n",
  "stderr": "",
  "cwd": "/Users/matanl/Developer/engine/src",
  "commandLine": [
    "out/host_debug/gen/flutter/tools/engine_tool/build_plan_test"
  ]
}

I suspect we'll need to align/agree on, what should the test runs do when tests pass, what should they do when they fail, and what logs are written or stored locally and on CI? My thoughts from looking into this is at the top of this issue (https://github.com/flutter/flutter/issues/157182#issue-2598256414), but I could use a second opinion or ACK.

Nov 12 '24 20:11 matanlurey

We've experimented with a --quiet flag to run_tests.py which causes it to only output anything when a test fails. @jason-simmons reverted one instance of that experiment, but I'm forgetting why that was. I'm interested in knowing what that use case was for full logs so that @matanlurey can make sure it's captured in the options that et will have.

Nov 12 '24 20:11 zanderso

I don't personally have a strong use case for full (non-failure) logs, other than a bit of paranoia that tests could be possibly not running (either due to being skipped, or misconfiguration) :-/. My dream scenario would be outputting something like the JUnit XML format so that we could see visually which tests ran/failed, but that's probably a ways out.

Nov 12 '24 20:11 matanlurey

I'd also be open to trying to re-land --quiet if that gets us closer to what et test would eventually do.

Nov 12 '24 21:11 matanlurey

I removed the use of the --quiet flag after an incident where an Impeller golden test was failing because a new file had to be added to the list of golden image paths.

--quiet was filtering out all output from child processes launched by run_tests.py. The diff indicating the reason for the failure was in the output of a tool launched by the test script. But it was not visible in the CI logs due to the quiet flag.

Nov 12 '24 21:11 jason-simmons

Ack. Let me follow up with Aaron and see if we can make the Impeller golden failures easier to deal with first.

Nov 12 '24 21:11 matanlurey

We believe that et is reasonably complete at this point and so I am closing this issue to shrink the backlog. If you disagree, feel free to reopen.

Apr 01 '25 17:04 jonahwilliams

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

Apr 15 '25 18:04 github-actions[bot]