sdk pkg/dartdev/test/commands/create_integration

The tests

pkg/dartdev/test/commands/create_integration_test RuntimeError (expected Pass)

are failing on configurations

unittest-asserts-release-linux-x64
unittest-asserts-release-mac-x64
unittest-asserts-release-linux-arm64

These were triggered from unrelated changes. The first two configurations on skip computing lints for macro generated augmentations, the last configuration [expect] introduce Expect.throwsWhen and Expect.throwsTypeErrorWhen...[test_runner] Support new features in Requirements..

Seems possible that this test is not hermetic and something external (like publishing of a package) introduced the failures.

cc @kallentu @pq (authors of the other changes on the blame-list)

cc @bkonyi who may be familiar with the setup here.

Feb 14 '24 20:02 sigmundch

Seems possible that this test is not hermetic and something external (like publishing of a package) introduced the failures.

At first blush, this seems plausible. Notice the stack traces:

Errors: 
Failed to build webdev:webdev:
../../pub_cache/hosted/pub.dev/dds-3.2.0/lib/src/devtools/handler.dart:138:28: Error: Required named parameter 'analytics' must be provided.
    return ServerApi.handle(
                           ^
../../pub_cache/hosted/pub.dev/devtools_shared-6.0.5/lib/src/server/server_api.dart:37:35: Context: Found this candidate, but the arguments don't match.
  static FutureOr<shelf.Response> handle(
                                  ^^^^^^

Feb 14 '24 21:02 pq

cc @elliette We might need an update to webdev here?

Feb 14 '24 22:02 nshahan

Ah yes this is due to an update to devtools_shared. DWDS doesn't depend on devtools_shared but DDS does. I think it should have been updated to a new major version because https://github.com/flutter/devtools/pull/7084 is a breaking change.

FYI @eliasyishak @kenzieschmoll @bkonyi

Would it be possible to pull the devtools_shared:6.0.5 package from pub (I think there is a window of time where owners can remove a version?) Then we can re-publish with analytics as an optional parameter and publish a major version. I don't think it makes sense for analytics to be required.

Feb 14 '24 22:02 elliette

I just retracted devtools_shared 6.0.5 from pub, so we should be good now.

Feb 14 '24 22:02 kenzieschmoll

I can confirm that this test isn't hermetic by design. It's meant to ensure that our dart create templates create projects that have valid pub dependencies and that the instructions printed at project creation work correctly, so the test worked as intended in this case. It's just unfortunate it can start causing failures on changes that are completely unrelated after a new package is published, but I'm not sure how we can work around that while still having this test detect these types of failures.

Feb 15 '24 16:02 bkonyi

It's just unfortunate it can start causing failures on changes that are completely unrelated after a new package is published, but I'm not sure how we can work around that while still having this test detect these types of failures.

Yeah, it's tricky. Two ideas come to mind, but I wonder if they are worth investing in.

idea 1: simulate the outside world with a hermetic copy

Pub supports using an alternate host for serving packages. I can imagine using that to launch a separate pub server within the CI to control what dart pub get can see on our tests.

For example, imagine an alternate pub server that hosts the same as pub.dev, but filters the contents using a manifest that lives in the SDK. The manifest would indicate what packages and which versions are included. We would use an autoroller bot to lands changes in the SDK updating the manifest periodically. Each autoroller change adds SDK commits like pub server manifest update: devtools_shared v6.0.5 added). Then, when some integration issue (like the one detected here) pops up, we would be able to properly blame the autoroller change that exposed it.

idea 2: validate when non-hermetic assumptions change

Rather than controlling what dart pub get gets, we could validate that it gets what we expect. This could be done with a simple test that compares the pubspec.lock file against an expectation. We could even guard all integration tests behind it, so they are only run when the package versions match. When a new package version is released, that test will fail. This would still trigger against random changes, but we can easily provide from it a clear explanation so it gets properly addressed.

Idea 2 is much simpler and something we could do short term, idea 1 is very involved. I like that it will provide the proper blame and could potentially make our tests hermetic again. It would also use fewer external services in the CI. I know we try to avoid it for building for reliability (e.g. our DEPS don't directly use external services, but use mirrors to repos hosted on Dart's git servers instead.), but I don't know what's our philosophy around it for testing. Given how much effort it is, I find it hard to justify.

@athomas - FYI - not sure if this has been tackled by engprod in other areas, maybe there are other approaches we should consider here?

Feb 15 '24 17:02 sigmundch

Generally, pub is quite poorly supported on the SDK CI infrastructure.

Idea 1: autorollers add a lot of load to the CQ (depending on how frequently they run), or delay us from finding this problem quickly. Perhaps this is worth it if this breaks a lot. But I'd avoid it if this doesn't happen often.

Idea 2: seems reasonable.

Idea 3: would be to instead strengthen the integration testing of packages used in dart create so that publishing fails if it would break dart create in the SDK. @jonasfj, is there prior art for this?

Idea 4: Deflake using the main SDK: Re-run the test using an unpatched version of the SDK (this could either be done by downloading a recent SDK from the same branch, or by doing a build without the patch applied). If the unpatched SDK also fails, the test is broken at head. The snapshot caching we're working on for reclient might make the unpatched build strategy more feasible than it was in the past. If the test also fails with the main SDK, consider it "broken at HEAD" and allow the CQ to pass (on CI, that still turns the bot red). @whesse WDYT?

Feb 16 '24 08:02 athomas

On idea 4, solving the problem of CQ breakage when the CI test has not been run with the new context, and failed:

Rerunning the test on the unpatched SDK, and using that as a new baseline, is duplicating the work that the CI should be doing - as soon as the CI finishes running a new build and testing it, the failure will appear there and be in the baseline. So doing this work in a custom deflaking step on the CQ would only help if it finishes faster (because it would only run one test), or if the CI isn't running because no commits have landed recently.

Non-hermetic tests really don't fit our CI and CQ test checking. Perhaps all non-hermetic tests should run on a separate builder, and we have different result analysis on that builder? A test that starts failing due to a change in the world is in a sense flaky, so we don't really have a way to update existing CI results for a test on an existing tested commit from passing to failing, based on new runs.

Looking at the entire issue of needing to test pub as it interacts with the external world, and finding a good CI solution for that, seems like the right thing to do, rather than finding an improvement for a specific non-hermetic test.

Feb 16 '24 09:02 whesse

Re idea 4

I don't think it just address non-hermetic tests. Any test that fails because CI is behind (and it often is, especially in MTV) will trigger a red build (often pre-approval handles that, but not always because of incomplete coverage). The typical user reaction is to retry the build (which may still fail). These unconditional retries are much more expensive than a "cached build+run single test" deflaking run (don't have data on how often that happens though).

Agree on figuring out pub on CI. I do think the redness here is a valuable signal (dart create is broken for end users, that's something we should notice quickly).

Feb 16 '24 09:02 athomas

Idea 3: would be to instead strengthen the integration testing of packages used in dart create so that publishing fails if it would break dart create in the SDK. @jonasfj, is there prior art for this?

That might be hard, because templates evolve.

idea 1: simulate the outside world with a hermetic copy

In pub test cases we have a custom server with packages defined in the test cases.

But if you wanted something really dumb and simply, you could simply make a pub-server that runs on localhost, and proxies requests to pub.dev. Then filter JSON version listing response to remove all versions older than a given date. Of course, then we'd need an autoroller to continuously update the date we use to filter by :see_no_evil:

Because ultimately, we'll want to know it if dart create is broken.

Feb 16 '24 11:02 jonasfj

... Then filter JSON version listing response to remove all versions older than a given date...

brilliant! that removes the need to do fine-grain tracking of what packages we care about.

idea 1: autorollers add a lot of load to the CQ (depending on how frequently they run), or delay us from finding this problem quickly. Perhaps this is worth it if this breaks a lot. But I'd avoid it if this doesn't happen often.

I think a roll once a day is probably good enough in this scenario. I believe webdev also has integration tests in it's own repo, so it's possible that this issue would have been caught there too.

Re: re idea 3. Note that in this case the template was fine and running dart create completed successfully. What failed was an installation step from the template instructions: dart pub global activate webdev (failure log).

Feb 16 '24 17:02 sigmundch

sdk
sdk copied to clipboard

pkg/dartdev/test/commands/create_integration_test failures

sdk sdk copied to clipboard

pkg/dartdev/test/commands/create_integration_test failures

sdk
sdk copied to clipboard