rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

RFC: Out-of-tree test suite

Open kirtchev-adacore opened this issue 7 months ago • 20 comments

This PR proposes the creation of an external stand-alone test suite and associated tools for verifying the Rust toolchain that also fits into existing Rust development workflows.

Rendered

kirtchev-adacore avatar Jan 10 '24 14:01 kirtchev-adacore

I think that having some form of standalone end-to-end testing is a great idea. Also nice for other implementations, e.g. gccrs seems to be interested: https://gcc-rust.zulipchat.com/#narrow/stream/266897-general/topic/RFC.3A.20Out-of-tree.20test.20suite.

A few questions/notes:

  1. What kind of tests should live in this suite? Codegen tests make sense but UI tests are trickier, since the stderr may vary by implementation. Maybe the runner could accept a --ui-output-dir that can override files of the same name.
  2. Assuming there will still be tests directory in rust-lang/rust that isn't this submodule, how should a user decide where to place tests?
  3. When would this test suite get run in CI?

A downside of tests in a submodule is that workflow for adding tests with a feature gets less ergonomic - two places to commit and potentially get out of sync. Perhaps the main test suite source could still live in rust-lang/rust, but certain directories get automatically synced to rust-lang/rust-test-suit that has the OOT runner and additional tests.

llvm-test-suite provides a good reference.

tgross35 avatar Jan 12 '24 00:01 tgross35

Because of the issues I mentioned re: "rustc is a cross compiler and host and target remain build parameters", I am slightly concerned that a test driver might have to be compiled, for each host, for each target, in a combinatorically explosive fashion, as is the case (as far as I understand it) with libstd's artifacts? I suppose if we could get it down to one per target, that might be nice, but then we can't use rustup to distribute this "run the test suite" artifact and actually support all the targets that can run tests, because some targets don't support host toolchains, thus rustup on them is... nonsensical, if it even works... but do support running a rather lot of the tests, which have to be compiled on some host and then run on the target.

I realize "run tests for a random target that can't host the toolchain but would support most of the test suite via the remote-test-client and remote-test-server" isn't exactly trivial, so improving the experience of doing that is certainly something can get behind. Unless this proposal isn't actually concerned about such targets except as a "future possibility", and is only truly worried about hosts? In which case, it should probably say that, because there are more bins than "we ship a host Rust toolchain" and "this target doesn't support std". People are happy to accept "this isn't implemented for this target" and are disappointed to see "this was supposed to work for every target! ...but actually doesn't".

workingjubilee avatar Jan 12 '24 08:01 workingjubilee

I think it would be more productive if you came to Zulip t-compiler with a precise problem description and we could then brainstorm a nice solution from scratch, considering all the potential downsides.

my personal recommendation would be to make this RFC a living document e.g. on hackmd and then discuss this on zulip to figure out the details and potential smaller uncontroversial steps that you can implement immediately.

oli-obk avatar Jan 12 '24 14:01 oli-obk

  1. What kind of tests should live in this suite? Codegen tests make sense but UI tests are trickier, since the stderr may vary by implementation. Maybe the runner could accept a --ui-output-dir that can override files of the same name.

Ideally, that would be all tests, including codegen, UI, cargo, clippy, etc.

For UI, I see several possibilities:

  • If the testable is rustc, then the driver uses the existing .stderr files.
  • If the testable is some other conforming tool, then the driver accepts a --ui-output-dir path with similar hierarchical structure as UI, with custom .stderr files that are tailored to the tool. When checking for the presence of an error, the driver could ignore everything after an ERROR annotation, and check for the presence of a corresponding .stderr with an error diagnostic at that location.
  1. Assuming there will still be tests directory in rust-lang/rust that isn't this submodule, how should a user decide where to place tests?

Given that the OOTTS is supposed to be a complete testing replacement, there will no longer be a tests directory in rust-lang/rust.

  1. When would this test suite get run in CI?

The OOTTS would run after the toolchain is built, packaged, and installed in a clean environment, like so:

  1. Apply PR (if there is one)
  2. ./x build ...
  3. ./x dist ...
  4. ./x install ... in a clean environment
  5. Run the OOTTS.

kirtchev-adacore avatar Jan 16 '24 12:01 kirtchev-adacore

I think such a big RFC is the wrong way to tackle this. As far as I understand, the problem being solved here is that you want to be able to test the distributed toolchains.

The problem that is being solved is to ensure that currently performed post-test processing do not alter the validity of a toolchain. This basically amounts to testing an off the shelf toolchain.

I think there are simpler ways to solve this problem, for example by just shipping the current testsuite as-is with some changes to tooling that would allow testing distributed toolchains.

If there are simpler solutions, then that would be great!

Instead of bikeshedding about this RFC, I think it would be more productive if you came to Zulip t-compiler with a precise problem description and we could then brainstorm a nice solution from scratch, considering all the potential downsides.

I will follow up on this.

kirtchev-adacore avatar Jan 16 '24 12:01 kirtchev-adacore

By your description of the problem, I am slightly worried that in general you have a different idea of how the build process goes down than what happens in actuality, both for toolchains and actual executions of rustc involving cargo.

My understanding, which is derived from poking around sources and looking at very verbose logs, and which could be very wrong, is that ultimately the testing portion of the infrastructure constructs a call to cargo, passing various details such as which rustc to use. The call itself could be nested in higher-level tools such as compiletest.

Is this close to reality?

Note that even if the "right" rustc is being called and so forth, post-test processing can still alter the state of a toolchain, thus invalidating the testing that was performed.

kirtchev-adacore avatar Jan 16 '24 12:01 kirtchev-adacore

Because of the issues I mentioned re: "rustc is a cross compiler and host and target remain build parameters", I am slightly concerned that a test driver might have to be compiled, for each host, for each target, in a combinatorically explosive fashion, as is the case (as far as I understand it) with libstd's artifacts?

Unless I am misunderstanding, I do not think that this will be the case. The test driver would be compiled for each host. Since a testable cross Rust toolchain comes with its own cargo, rustc, host libstd, and target libstd, the test driver's job is to execute a compile command which indicates the host and target. cargo or rustc will then link the correct libstd and produce a target executable.

I realize "run tests for a random target that can't host the toolchain but would support most of the test suite via the remote-test-client and remote-test-server" isn't exactly trivial, so improving the experience of doing that is certainly something can get behind. Unless this proposal isn't actually concerned about such targets except as a "future possibility", and is only truly worried about hosts? In which case, it should probably say that, because there are more bins than "we ship a host Rust toolchain" and "this target doesn't support std". People are happy to accept "this isn't implemented for this target" and are disappointed to see "this was supposed to work for every target! ...but actually doesn't".

The proposal does take into account native and cross compilation, though it does not mention how to perform execution on the target (possibly via remote-test-client/-server).

kirtchev-adacore avatar Jan 16 '24 13:01 kirtchev-adacore

Just to be sure: are you planning to do all the work that needs to be done for whatever solution we end up accepting?

oli-obk avatar Jan 17 '24 10:01 oli-obk

I think such a big RFC is the wrong way to tackle this. As far as I understand, the problem being solved here is that you want to be able to test the distributed toolchains.

I think there are simpler ways to solve this problem, for example by just shipping the current testsuite as-is with some changes to tooling that would allow testing distributed toolchains.

We have had a few post-PGO/BOLT miscompilations which tests don't catch since those optimizations only happen on the dist runners but not on the test ones. So it would also be useful for the rust repo itself we could run the testsuite against dist artifacts too.

the8472 avatar Jan 17 '24 20:01 the8472

This has been fixed, we now run tests on post PGO artifacts. The optimized artifacts are tested before they are shipped.

Noratrieb avatar Jan 17 '24 21:01 Noratrieb

Just to be sure: are you planning to do all the work that needs to be done for whatever solution we end up accepting?

@oli-obk No, AdaCore will not be doing all the work, as we were hoping that this effort would be community-led.

kirtchev-adacore avatar Jan 18 '24 08:01 kirtchev-adacore

This has been fixed, we now run tests on post PGO artifacts. The optimized artifacts are tested before they are shipped.

@Nilstrieb Just to make sure I understand, the sanity checks you proposed are now performed?

kirtchev-adacore avatar Jan 18 '24 08:01 kirtchev-adacore

No, AdaCore will not be doing all the work, as we were hoping that this effort would be community-led.

What does that mean?

To me, "community-led" means that someone from the community, like yourself or AdaCore, is doing the work.

m-ou-se avatar Jan 18 '24 09:01 m-ou-se

No, AdaCore will not be doing all the work, as we were hoping that this effort would be community-led.

While I do like the motivation, I don't think it is very high priority for volunteer contributors. To state this more controversially: Who other than AdaCore benefits from this work?

I would recommend trying to find an existing contributor that you can contract to do this work. I do fear otherwise this RFC may, even if accepted, become nothing more than that. We've had various RFCs and MCPs before that were well motivated, but ended up lacking a dedicated implementor to actually do the work.

oli-obk avatar Jan 18 '24 09:01 oli-obk

Who other than AdaCore benefits from this work?

@oli-obk I suspect that Ferrous Systems (hello @skade, @pietroalbini !) would benefit from this work, as Ferrous Systems has and will continue to certify the Ferrocene toolchain for ISO 26262, and I assume they would care about the "Testing the artifacts to be distributed" and "Certification" motivations.

I would recommend trying to find an existing contributor that you can contract to do this work. I do fear otherwise this RFC may, even if accepted, become nothing more than that. We've had various RFCs and MCPs before that were well motivated, but ended up lacking a dedicated implementor to actually do the work.

Thanks for the suggestion, I will pass this along.

kirtchev-adacore avatar Jan 18 '24 09:01 kirtchev-adacore

@Nilstrieb Out of curiosity, what kind of test would detect an improper packaging of a tool or a library?

kirtchev-adacore avatar Jan 24 '24 10:01 kirtchev-adacore

@m-ou-se @Nilstrieb @oli-obk @the8472 Assuming that AdaCore does all the implementation work on the RFC, what would it take to get the RFC accepted?

kirtchev-adacore avatar Jan 24 '24 10:01 kirtchev-adacore

@Nilstrieb Out of curiosity, what kind of test would detect an improper packaging of a tool or a library?

Compile hello world after extracting the tarball or something like that.

Noratrieb avatar Jan 24 '24 10:01 Noratrieb

@m-ou-se @Nilstrieb @oli-obk @the8472 Assuming that AdaCore does all the implementation work on the RFC, what would it take to get the RFC accepted?

We'd want the best solution to your problems, which from what I've gathered so far, is not this RFC. If you list all the specific problems you're having, we may be able to come up with something better for each.

Noratrieb avatar Jan 24 '24 10:01 Noratrieb

@Nilstrieb To understand the problem we are trying to solve with this RFC, we will need to switch our view point to the safety critical software paradigm. In this world, human lives depend on the correct operation of software. In a worst case scenario, a software error could lead to falling airplanes, exploding nuclear plants, etc.

To prevent this from happening, compilers and their libraries and runtimes are certified against a software safety standard, such as DO-178C. The certification aims to build high degree of confidence that the compiler will not generate wrong code under various conditions. To prove this, the compiler and the generated code are subjected to rigorous testing, where each test is linked back to requirements. Since nothing is without fault, known compiler issues are described and supplemented with detection, workaround, and mitigation strategies.

One way to achieve high degree of testing confidence is to test the delivered version of the compiler. This method ensures that the compiler being certified is the exact same compiler used to compile safety critical software. Our RFC is essentially trying to modify the existing testing process to achieve this methodology, while at the same time maintain developer friendliness.

kirtchev-adacore avatar Jan 26 '24 12:01 kirtchev-adacore

Given the complexity and limited value to the community at large, I am retracting this RFC. AdaCore plans to employ a sanity check test suite consisting of selected relocated cargo, clippy, core, and std tests, along with new tests, that exercise the toolchain after installation. Thank you for your time, consideration, and reviews!

kirtchev-adacore avatar Mar 19 '24 13:03 kirtchev-adacore