build icon indicating copy to clipboard operation
build copied to clipboard

Ability to run/rerun tests intelligently.

Open matanlurey opened this issue 7 years ago • 6 comments

This is a tracking/meta issue for further test+build integration.

Today, we can run all tests for a given package after a build completes:

$ pub run build_runner test

Using test's arguments, we can select/deselect various tests:

$ pub run build_runner test -- test/some_test.dart
$ pub run build_runner test -- -t integration-test
$ pub run build_runner test -- -x fails-on-travis

However, a common pattern is running tests, making fixes, and re-running tests.

Imagine the following:

$ pub run build_runner test

> 50 tests failed.

$ pub run build_runner test --rerun=failures

> 11 tests failed.

$ pub run build_runner test --rerun=failures

> 0 tests failed.

I could also see the default behavior being --rerun=changed, i.e., only run tests whose code or dependencies have changed. This would help, for example, travis. If we haven't touched the angular_ast package:

$ pub run build_runner test --rerun=changed

> No tests need a rerun as a result of source or dependency changes.

... so, in short:

--rerun=all # Today's behavior
--rerun=changed # Similar to how Bazel works already
--rerun=failures # Requires further testing integration/caching

matanlurey avatar Mar 22 '18 18:03 matanlurey

I'm super interested in this capability for reducing CI times on large code bases. Is there any timeline associated with this yet (more specific than 'some time in the future')? If not, is there any sort of recommended path for achieving similar results via Bazel somehow (our org hasn't toyed around with Bazel at all yet)?

trentgrover-wf avatar Jun 21 '18 19:06 trentgrover-wf

I'm super interested in this capability for reducing CI times on large code bases. Is there any timeline associated with this yet (more specific than 'some time in the future')?

No specific timeline, we still have a lot of core work that is higher priority relating to full Dart 2 support. That will consume most of our time for Q3, in addition to continuing to drive down bugs and supporting flutter users better.

If not, is there any sort of recommended path for achieving similar results via Bazel somehow (our org hasn't toyed around with Bazel at all yet)?

Bazel in general supports this out of the box, because tests are treated essentially just like normal builds. However, the Dart support for Bazel is not actively being supported at this time.

jakemac53 avatar Jun 21 '18 19:06 jakemac53

any updates / plans here?

trentgrover-wf avatar Oct 16 '18 05:10 trentgrover-wf

Still nothing planned here

jakemac53 avatar Oct 16 '18 13:10 jakemac53

2 years later... any news?

listepo avatar Oct 25 '20 20:10 listepo

This issue remains open because it is a valid feature request but the amount of work required to design/implement a working solution here is very large and there is no plan to invest in it at this time.

There are a bunch of challenges involved in this feature which make it very difficult to implement a sufficiently robust system, which we would be willing to release and maintain in our core platform.

Particularly when it comes to the non-dart dependencies (generally this applies to integration tests). One option is to take the sort of the ninja approach to dependencies, and require users to manually specify deps via a metadata annotation of some sort. That is very error prone though and I think not a solution that any of us consider really viable.

There is also the problem of resources completely outside of the build system (anything could be invoked via Process.run etc). We likely just wouldn't have a solution for that at all since we don't track those.

All of that being said - here is a strawman proposal for how at least a watch mode could be done in a package similar to build_test (or possibly we might accept a contribution in that package - but ask us before putting in the work please):

Add a new builder(s?) which runs tests and fails/succeeds accordingly

The general idea is that this is a builder which actually calls pub run test but for one test at a time. It artificially fails the build if the test fails.

High level overview of the builder implementation

  • The primary inputs would be test.dart files that the user wrote.
  • Probably one builder per platform (web, vm, etc)
    • Web version would require the build_web_compilers package dependency - the vm version might work with or without build_vm_compilers.
  • The builder tracks as deps using a call to canRead all transitive files for the compiled test (.js files for web, .dill or .dart files for vm).
    • Note that this should actually track the deps of the .<platform>_test.dart bootstrap files, not the original test file.
    • Needs to allow manual specification of additional test deps as well.
  • Then the builder runs the test in much the same way as pub run build_runner test does today.

Add @Data annotation which accepts a list of data dependency globs

Any external deps for the test would be listed here. These files must be included in the build to be tracked.

We would accept globs which represent relative paths from the root of the current package (lib/*.txt for example), as well as probably globs for external packages, something like package:foo/*.txt and/or asset:foo/lib/*.txt which would be equivalent.

The builder would use the findAssets api to find these files . It would then call canRead on each of the results of those globs in order to set up dependency tracking for the test.

Note that support for globbing other packages (through findAssets) does not exist today, so support would likely need to be added for that.

Run the test in a sandbox dir with only the tracked files present

We would use the scratch_space package to create a new temp directory for each test, and run pub run test --precompiled=<scratch_space_dir> to run the test runner in that directory, possibly also setting the current working dir to be that directory if possible.

This would help ensure that the @Data annotations are properly tracking all the external things that we can (still doesn't track external binaries and things though). Since only the listed files are present in the temp dir any other files will not exist to the test and it will fail.

Require the builder to be explicitly enabled?

In order to not run tests always as a part of all builds, the new builders would probably always be disabled by default, and then be explicitly enabled later on.

Something like --define=build_test:runner=true would be how this would be surfaced to the user (and a feature like https://github.com/dart-lang/build/issues/1544 could make that nicer).

In order to not lose all knowledge of previously ran test states, the build_test:runner builder would not be the thing that actually runs the test, it would only read the output of the builder that actually runs the test.

The builder that does the actual test running would be listed as optional: true, so it doesn't actually run unless this other builder which wraps it runs (and requests its output file).

Actual reporting of the test status (success/failure/skipped) could be done by either builder, I could see arguments either way.

Enabling re-running of all tests or failed tests.

I am not sure how exactly we would support this at all via a Builder alone.

If we built something into the test command itself we could have an extra input to all tests which is generated and always contains a random hash. The test command would then manually delete this file before any build if it was asked to re-run all tests, causing it to be regenerated with a new hash, and invalidating all test actions since they depend on that file.

As for re-running failed tests only you could make a second file which only tests which failed depend on, and invalidate it similarly.

jakemac53 avatar Oct 26 '20 15:10 jakemac53