Specification
Specification copied to clipboard
Test2 subtests not supported?
https://github.com/Test-More/test-more/issues/942
Pretty sure in our discussions we decided the Test2 format would be supported formally by TAP14. I have not verified the claims in this ticket yet.
problems with Test2 format:
- it appends character to
reason
so to make it stream-parsable spec should prohibit tailing '{' in reason / directive description- for example:
- which one is subtest?
ok 1
orok 3
?
- which one is subtest?
- for example:
ok 1 - {
ok 2
1..1
ok 3
- where should diagnostic message come?
- after result (ie as first lines after subtest start? same indent or subtest indent?)
- or after closing
}
?
- subtest end (unlike TAP14
# Subtest:
declaration) is invalid line accorting TAP14
From parsing point of few (for automated processing of test results), Test2 subtests removes one advantage over junit - stream parsing. When you removes that, junit becomes better choice.
I agree with @happy-barney here. The way subtests were originally designed (er, hacked into the mess that was the old test harness) is that test output would be line-based and the final summary "ok/not ok" of the subtest would simply be after the test:
ok 1 - Starting engines
ok 1 - engaging hyperdrive
ok 2 - setting course
not ok 3 - Exceeding lightspeed
not ok 2 - Sci-fi trope
The semantics are clear and we don't have starting and ending braces forming "blocks" to parse. The subtest summary output was always after the subtest ran. A parser can immediately take action on a not ok
line (such as halting the test suite, if needed). Further, because the TAP parser was supposed to simply ignore any lines that were not valid TAP, missing the leading or trailing curly brace would not be an issue, but now I'm less sure of that.
Is there a benefit for this { subtest }
syntax that I'm missing?
The idea behind the braced/block format is to indicate a "buffered" subtest. Buffered subtests were added so that subtests could run concurrently without confusing a parser. For example 2 subtests running concurrently without buffering can result in this:
ok 1 - blah
ok 1 - subtest a assertion a
ok 1 - subtest b assertion a
ok 2 - subtest b assertion b
ok 2 - subtest a assertion b
ok 2 - subtest b
ok 3 - subtest a
There are now several tools on cpan that use Test2::AsyncSubtest, tools like Test::Class::Moose, Test2::Tools::Subtes, Test2::Workflow, etc. They rely on the buffered subtests because they allow subtets level concurrency, which is not really possible using the old style subtests that stream subtest events immediately.
For these tools only 1 subtest can print at a given time, and it is printed all at once with the braces to contain it.
@exodist buffered subtest is an implementation ... does it matter when you read test output?
@happy-barney I am not sure I understand the question. When you read it does not matter, but as my example shows the order of the output DOES matter. If you have 2 concurrent subtests and they both output simultaneously then there is no way for a harness to deduce which subtest each indented line belongs to.
In any case, I feel like the ship has sailed here. This type of buffered subtest has been around for years, and is used many places, changing the format now would break several cpan modules, but it would also break several companies that I know are using it.
I mean if you want I am fine adding documentation to some modules that states things like "There is an argument against this a type of subtest and its output, here is the ticket ...". But I am not going to break a ton of people by removing it. I also find it much cleaner and easier to parse in Test2::Harness.
I only have a little bit of energy to engage in this. If nobody likes this and wants to keep it out of the TAP standard then fine, but I will keep the implementation and at most warn in docs that anyone who requires pure TAP can use something else. (Note Test::More does not use this, only using Test2::AsyncSubtest and Test2::Subtest does, so modules currently in perl CORE are not effected)
@exodist I have nothing against with buffering, that is good
I just don't see any reason to say "this is buffered subtest output" (from parsing TAP later point of view). buffered subtest can still output
# Subtest: subtest message
ok 1
ok 2
1..2
ok 1 subtest message
Ah, brief history:
- TAP was designed to be both human and machine readable, I feel that as a result it did both poorly
- Old school perl programmers get used to it, but teaching people to use/run/write tests has always resulted in confusion around subtests because it feels backwords
- Adding a comment to note when a subtest starts feels like a hack
- When I took ownership of Test::More 10ish years ago I had a lot of people asking me to "fix" subtests and what they look like
- One such time was a request to "outdent the subtest comment" so that the comment lined up with the subtest events, which spawned a several year development process to write Test2 to avoid the problem that changing something as simple as a comment indentation would not break all of cpan in the future (Everyone did string comparison of TAP output to verify their custom test tools)
So, when I started writing Test2 with the primary goal of "Fix the pain points" subtest format was one of the issues high on the list of things that cause people pain, pain that leads to complaints coming to me.
At this point TAP is just the default/fallback used by Test2, which supports formatter plugins. If you use Test2 with Test2::Harness then TAP may never even enter into the pipeline, Test2::Harness uses a machine readable format (JSONL) to send data around internally, and between the test and the harness, then a separate human-friendly output format to the terminal.
But to summarize the rant: I got too many complaints about the old subtest format, and when training people in testing the subtest format seemed difficult for newbies.
IIRC (will try to dig up a link a bit later), the decision was to keep indented/commented style subtests, since there was already support in multiple implementations, and leave buffered subtests for later (the initial draft that I wrote included buffered subtests, but they raised some questions).
I think it'd be good to bring them back in a TAP 15. They're nice and imo much easier to read, since you get the bottom line up front. Node-tap has supported them for years. We just need to come to alignment on these questions (non exhaustive list):
- Where do diagnostics for the summary test point go?
- What happens when the trailing
}
is missing? (Or perhaps, is a test point ending in{
that isn't a leading point for a buffered subtest just an error?)
I believe that we said "3 major implementations supporting" is the line for quorum on additions, so another blocker would be someone else implementing it (and perhaps, Test2 and node-tap meeting on any potential discrepancies they may have, but I think we're pretty close, since I modeled node-tap's support after Test2.)
additional question: does anyone need to distinguish whether subtest was buffered or not ?
additional question: does anyone need to distinguish whether subtest was buffered or not ?
I haven't found this particularly valuable in node-tap, tbh. I like the buffered Test2 style better when reading the tap itself, but I changed node-tap to always emit comment-prefixed "standard" subtests (even when they are actually async and buffered), and it's not really any more work. You still have to buffer up the data and then emit it all at once with a wrapper, it doesn't make much difference if the wrapper puts the summary at the top or the bottom. I haven't encountered any cases where it makes a difference for the harness to know whether subtests were run in sequence or in parallel.
it makes difference when you are using non-console tools to process TAP Test2 can generate two plan rows in row which will make tools not supporting not-published-yet (and not-specified in output) TAP version to fail
Test2 can generate two plan rows in row
What does this mean? Can you give an example?
# Subtest: foo
ok 1 - foo {
ok 1
1..1
}
1..1
@happy-barney Right, but this issue is specifically about the braced "buffered" subtest style vs the indented/commented subtest style in TAP 14. How is that any different from what's in the spec?
# Subtest: foo
ok 1
1..1
ok 1 - foo
1..1
your example is ok, that braced subtest is not ... my original complain was that Test2 suite uses this braced, which is not supported by 3rd party TAP parsers. If they want braces, they should add them for example as
# Subtest: foo {
ok 1
1..1
ok 1 - foo }
1..1
I am not changing the buffered subtest brace style in Test2. I chose them for human readability/aesthetics. If people do not like the brace style for whatever reason, they can choose not to use them. I am not going to break what is now years of downstream things that depend on them being this format. The most I might do is give people an option (opt-in) that allows them to render buffered subtests without the braces using the old style, comment-indented-result.
At this point Test2 uses TAP as a default only for historical reasons. If someone uses yath, the default output/intermediary is a jsonl stream. TAP is an incredibly lossy format, it is intended to be good for both humans and machines, and in my option ends up terrible for both. Having written a handful of TAP parsers now I have experience to back up that it is terrible for machines, and lossy. The human side is more subjective, so meh, not interested in arguing that point.
JUNIT is also pretty awful, but is a much more widely supported format.
Things are moving away from TAP, and I support this because TAP is awful. Test2's buffered subtests have been around for the better part of a decade, and at this point TAP can either support it or not. Neither choice will make the world end or effect anyone significantly. Test2 is not going to make a radical change to support a third party standards body where 1 or 2 people dislike how it looks.
Also, node-tap is "third party" and supports the braced Test2 style. It's not that hard. I don't think test2 has to change, and making that style officially part of a TAP spec only needs a formal description and a third popular implementation.
In the braced variant that Test2 generates, where do SKIP/TODO directives end up?
After the opening brace:
not ok foo { # TODO feature xyz not yet merged
ok x
not ok y
ok z
}
or inside:
not ok foo # TODO feature xyz not yet merged {
ok x
not ok y
ok z
}
ok 1 - foo { # TODO This is todo
ok 1
1..1
}
1..1