wasm-bindgen
wasm-bindgen copied to clipboard
Add Support for cargo nextest
This is still work in progress: https://nexte.st/book/custom-test-harnesses.html
Progress
- [x] Added support for -h and --help w/tests
- [x] Added support for -V and --version w/tests
- [x] Added support for --list --format terse w/tests
- [x] Added support for --list --format --ignored w/tests
- [x] Add support for <test-name> --nocapture --exact w/tests
- [x] Reinstated support for --include-ignored w/tests
- [x] Reinstated support for --skip name w/tests
- [x] Reinstated support for --skip=name w/tests
- [x] Reinstated support for filter w/tests
- [x] Solved the temporary directory conflict caused by the concurrent invocation
- [x] Update --include-ignored to use clap
- [x] Update --skip name to use clap
- [x] Added tests to execute tests on nodejs
- [x] Fixed support and added tests to execute tests on deno
- [x] Add tests to execute tests on chrome
- [x] Added tests to execute tests on firefox
- [x] Add tests to execute tests on safari
- [x] Add tests to execute tests on edge
- [x] Add tests with cargo nextest invoking wasm-bindgen-test-runner
- [x] Use feature! macro to simplify testing
- [x] Use feature! test_mode parameter to execute the specification over all the test_modes
Updated to use clap
- docopt was used already in wasm-bindgen, but its unmaintained https://github.com/docopt/docopt.rs and recommends clap or structopt, but structopt's docs state "As clap v3 is now out, and the structopt features are integrated into (almost as-is), structopt is now in maintenance mode"
Updated the macro wasm_bindgen_test
- To allow listing handling as required by nextext
Testing
-
I wasn't sure what was the best place to put the tests in, because of the custom test runner in the cli crate, so I placed them on main tests folder.
-
I used a variation of BDD that I use for many years now. -- Although it seems a bit more verbose, it makes writing tests a lot simpler and faster, anyone can understand them and add new ones. -- When something breaks its very easy to understand why. -- Although conter-intuitive, in practice I have found that they are a lot easier to update on refactors.
Overhead
- When used from cargo nextest the overhead is pretty real, as it has to load the runtime and the wasm into it, so it can be a lot, but it runs them in parallel, so it makes up for it on tests that take a long to execute.
Architecture:
- I would prefer to organize the code into: -- a folder with the CLI and environment stuff -- a folder with the wasm handling -- a folder with the runtimes
Apologies for the delay, I was on vacation, still catching up. Planning to take a look tomorrow!
That's okay! I still have one missing piece ahead to get it working with cargo nextest, so its fine.
Well its working, actually its my second version working, the fist one used locks.
But according to the the library documentation it shouldn't work (at least cross-platform):
See the tests in lib.rs for cross-platform lock behavior that may be relied upon // Concurrent shared access is OK, but not shared and exclusive. // Once all shared file locks are dropped, an exclusive lock may be created; // No other access is possible once an exclusive lock is created. // Once the exclusive lock is dropped, the second file is able to create a lock. https://github.com/danburkert/fs2-rs/blob/master/src/lib.rs
Anyway, the overhead was as bad as I expected, as each test execution now requires the same overhead as a complete assembly, I only tested on firefox so far, about 12-13s each, thermal throttled.
But when tests take longer than the overhead, the extra cores start to payoff.
I'm having thermal throttling issues, but my speed boost might be around 3x when I have the tests configured for very heavy parameters, its a property based testing variation.
I was waiting for your review, because as I do more tests, I end up finding more stuff to fix, making the review harder on you. Sorry.
I ended up fixing support for deno, not sure why, but it was for sure broken. I added some tests and updated the github action, so now its under control. You can cherry pick that if you want.
@daxpedda I have been trying to create a macro to simplify the tests, this will be particular useful for tests that should be run over the different runtimes supported.
I'm still working on the syntax, as the macro by itself, is allowing different usage patterns.
But right now, this is the format already working:
feature! {
given_there_is_an_assembly_with_one_failing_test();
when_wasm_bindgen_test_runner_is_invoked_with_the_option("-V");
"Outputs the version" {
then_the_standard_output_should_have(
&format!("wasm-bindgen-test-runner {}", env!("CARGO_PKG_VERSION")),
);
}
"Returns success" {
then_success_should_have_been_returned();
}
}
It expands to two tests
#[test]
fn outputs_the_wasm_bindgen_test_runner_version_information_feature() {
let mut context = Context::new();
given_there_is_an_assembly_with_one_failing_test(&mut context);
when_wasm_bindgen_test_runner_is_invoked_with_the_option(&mut context, "-V");
then_the_standard_output_should_have(
&context,
&format!("wasm-bindgen-test-runner {}", env!("CARGO_PKG_VERSION")),
);
}
#[test]
fn returns_success_feature() {
let mut context = Context::new();
given_there_is_an_assembly_without_anything(&mut context);
when_wasm_bindgen_test_runner_is_invoked_with_the_option(&mut context, "-V");
then_success_should_have_been_returned(&context);
}
If the target platform is wasm it uses [wasm_bindgen_test::wasm_bindgen_test] instead.
This allows for a more compact file, because there are sometimes 5-7 different outcomes for a single execution context
The idea is by default to respect the single outcome, allowing easy troubleshooting of regressions, but its possible to aggregate the executions on CI for faster execution times.
@daxpedda That's okay, the only issue is that the details aren't as fresh as they were.
I already improved the tests a lot, and I hope I'm able to improve them a bit more still, my objective is to make them more clear and intuitive and easier to change.
I haven't pushed those changed yet, because there are some things remaining in the macro.
I have no issues with the naming, I just use a naming from BDD, but different perspectives always enrich the solution.
I'm going to finish those, once I push that, I'm going to tackle your requests one by one.
Just a quick update, I was able to move forward with the multi_target_test model, I'm still flushing out details in the library as I convert the existing tests I created.
But its pretty much working, right now it already runs on all the runtimes that we specify, the problem is that its flushing issues and inconsistencies specific as expected.
One of the problems was that the browser was adding extra tabs to the output (I used cargo test with a #[test] as reference), its fixed now.
The other issue I'm having is some instability in the Safari Driver, unfixed for now.
But aside from that, the model seems to work well to ensure there is a consistent behaviour across all runtimes, but I should know more as I move forward.
I'll try to finish this part ASAP, but the blockers I believe will be only the issues that it will uncover.
The other issue I'm having is some instability in the Safari Driver, unfixed for now.
FWIW, I've been having spurious failures using Safaridriver lately as well, so far the only error I encountered is a port binding issue. My guess here is that the new ARM MacOS runners have some issue with binding random ports in quick succession or an issue around that with the driver itself.
The other issue I'm having is some instability in the Safari Driver, unfixed for now.
FWIW, I've been having spurious failures using Safaridriver lately as well, so far the only error I encountered is a port binding issue. My guess here is that the new ARM MacOS runners have some issue with binding random ports in quick succession or an issue around that with the driver itself.
I was able to trace down the issue I was having, it was caused by multiple parallel safari invocations, once I added a lock to only allow one at a time, the problem was gone.
That lock doesn't limit test execution in practice, as Safari by design only allows one session at a time https://developer.apple.com/documentation/webkit/about_webdriver_for_safari#2957226
The issue seems to be triggered by each safaridriver triggering the execution of a Safari --automation, and for some reason when that happens, even with workarounds, they have to be manually terminated for the execution to continue.
This solution seems to be cleaner, but the Safari --automation remains in memory, as it did before.
I removed most experiments, but left one, I moved the BackgroundChild into inside the Client, I did it because I was having some issues with the drop order, but left it in, because the code was cleaner, not sure if those issues remain.
I'm going to add some more safari tests to stress it more, then I'll add a parallel PR like I did with deno.
The other issue I'm having is some instability in the Safari Driver, unfixed for now.
FWIW, I've been having spurious failures using Safaridriver lately as well, so far the only error I encountered is a port binding issue. My guess here is that the new ARM MacOS runners have some issue with binding random ports in quick succession or an issue around that with the driver itself.
I'm going to add some more safari tests to stress it more, then I'll add a parallel PR like I did with deno.
I ended up still achieving some more instability, that led me to beef up the lock, it seems pretty robust now.
But sometimes Safari --automation still refuses to allow the creation of a new session, so in those situations, I updated it to kill it, and ask safaridriver again for a new session, forcing it to initiate it again, that seems to fix the remaining issues.
I tested terminating the Safari --automation for every wasm-bindgen-test-runner execution, but that imposes a severe delay in the tests execution overall.
Right now, as I converted more tests into the multi_target_test mode, I'm dealing with some inconsistencies of output between the targets, but the Safari instability seems at least a lot better.
@spigaz I believe you are spending quite a significant amount of time on testing facilities that don't have to be this perfect compared to all the other untested features wasm-bindgen
contains (which is unfortunate of course). While this is really appreciated, I would prefer to split efforts into multiple PRs instead of trying to do everything at once.
@daxpedda That wasn't my objective, but nextest implied changes in many different places, including in the arguments handling.
As a rule I try to add tests before introducing changes, to make sure I don't make it worse.
Because of that and the nextest tests I ended up with a lot of tests, that when I generalised found even more issues that I tried to fix.
Right now I'm just trying to iron out some issues with safari to avoid breaking the CI, it seems okay already, perhaps some starvation in the safari instance access.
But my point is to make it like we did with the deno PR, using the test base as reference.
The testing you are trying to implement here is greatly appreciated, but it is of way higher quality then this repository is used to. In wasm-bindgen
there are a lot of features that aren't tested for every browser and every target. E.g. trying to fix running Safari tests or fixing Deno is out of scope for this PR.
What I'm trying to say is that bringing the PR to a mergable state could be way simpler if the tests are in a comparable scope to the rest of the repository. Extending and improving tests can be done in a separate PR.
Ofc I will leave the decision on how to proceed here to you, I'm certainly not against getting this well tested in the same PR!
@daxpedda I understand your reasoning, and to be honest, I'm trying to narrow things to make it easier for you to review.
The problem is that cargo nextest stresses wasm-bindgen-test a lot, in my repo, 771 times, that means that all the instability issues pop up.
Anyway for now, I'm not being able to trigger issues with any of the supported runtimes...
I was finally able to remove the hacks I added to get cargo nextest working, because of the shared directory, using a ResourceCoordinator.
I just have some minor things, then I'm going to let you take the lead on this and I can create as much PRs as necessary.