Specification Standarize the notation of subtests in TAP

This issue was originally reported in https://github.com/TestAnything/testanything.github.io/issues/20, and the conversation has been moved here since it is related to the TAP specification.

It concerns the subtests in TAP and a formal standard to be followed by TAP consumer and producers. In the latest TAP specification (TAP 13) there is no formal definition of subtests. There were proposals and draft documents on this, some tools like Test::More and tap4j support with limitations.

Please feel free to chime in and suggest enhancements for this standard. A message is being written to the mailing list pointing to this issue too. The mailing list has several users that participated in the IETF draft and have lots of experience with TAP and its implementations.

/cc @Leont @xmojmr

Jan 06 '15 02:01 kinow

One thing that did not seem to be mentioned on the TAP mailing list was that the potential for a sub test to look like this:

ok 1 - something
ok 1 -     something
ok 2 - something

or

ok 1 - something
ok 1 -- something
ok 2 - something

I believe most libraries would support a failure just fine for both of those, its simpler that requiring an update to the libraries and forces stricter reasoning of tests which is a bonus.

Jan 06 '15 03:01 jonathanKingston

We really want to have this. I'm not fond of the current mechanism, but given it's out there and there's a lot of stuff outputting it already it may be a passed station.

Jan 06 '15 13:01 Leont

I believe most libraries would support a failure just fine for both of those, its simpler that requiring an update to the libraries and forces stricter reasoning of tests which is a bonus.

Both parsers I've worked on would consider those outputs an error. Subtests really should be something legacy parsers consider comments, which fortunately still gives a lot of space.

Jan 06 '15 13:01 Leont

@jonathanKingston I'm happy with the old proposal for subtests

TAP 13
1..1
ok 1 - Subtest 1
    ok 1 - subtest 2a
    ok 2 - subtest 2b

I'd be fine with other formats, but since there are tools in Perl, Java, Node, which output something similar to that, I'd be more keen to simply formalize it in a TAP 14 specification.

@Leont

We really want to have this. I'm not fond of the current mechanism, but given it's out there and there's a lot of stuff outputting it already it may be a passed station.

+1 I believe we have the formal TAP 13 specification, and a meta TAP 13.5, or TAP 13 + proposals/custom code. Considering the number of users in different programming languages outputting subtests, I think a proposal similar to what most tools produce (and that got consensus here and in the mailing list) would be a good approach.

As a side note, there is an open issue in tap4j where the TAP producer emits the subtests first. The current state machine in tap4j is not able to consume a subtest before. Quoting the example in that issue below:

    ok 1 - subtest 1a
    ok 2 - subtest 1b
    1..2
ok 1 - Subtest 1
    ok 1 - subtest 2a
    ok 2 - subtest 2b
    1..2
ok 2 - Subtest 2
1..2
ok

A perl test that will produce this output is as follows:

use Test::More;

subtest 'Subtest 1' => sub {
    ok(1,'subtest 1a');
    ok(1,'subtest 1b');
};
subtest 'Subtest 2' => sub {
    ok(1,'subtest 2a');
    ok(1,'subtest 2b');
};
done_testing();

It'd be nice to have the specification in cases like this to guide the developer :-)

Jan 06 '15 15:01 kinow

We should not be afraid to break old implementations and if anything that should be the correct behaviour. Old implementations should not ever ignore sub tests. This is a testing setup, ignoring tests would be a mistake. Due to the non standard nature of all the current use of subtests it is possible subtests could fail and the parent succeed - this should be caught or be a parse error; either for me is fine. This brings me to another ticket I need to make about how TAP software should respond if it can't be parsed.

I'm a little against looking to satisfy older implementations which we might end up calling 13.5 erroneously. This is a similar situation to what Markdown has been through and where possible I would prefer to go for breaking behaviour and errors will be spotted sooner rather than skipped over.

This was my theory around the formats above which should be treated as safe in most parsers where extra white space would be ignored around the comment.

I would therefore aim for:

False negative changes only
Try not to risk being interpreted by any parser as YAML
Discover the most TAP consumers which can cope with the two points above

I'm actually a fan of the tap4j as it would aid streaming output of the tests before they have completed running.

The other issue with the old proposal was that there is no acknowledgement of sub test count which I think is an issue.

Do sub tests count as the main count?
Sub tests counts help consumers fail earlier from a streaming output of tests

One last point, the number of main consumers and producers is still actually small, patching all of them to support a newer version shouldn't be a large piece of work.

Jan 06 '15 20:01 jonathanKingston

We should not be afraid to break old implementations and if anything that should be the correct behaviour. Old implementations should not ever ignore sub tests. This is a testing setup, ignoring tests would be a mistake.

The way it was designed means that even if the subtests are ignored the correct conclusion can still be made. It leads to a data-loss, not not to incorrect operation.

Due to the non standard nature of all the current use of subtests it is possible subtests could fail and the parent succeed - this should be caught or be a parse error; either for me is fine.

In the parser I wrote a discrepancy generates a non-fatal parse error.

I'm a little against looking to satisfy older implementations which we might end up calling 13.5 erroneously. This is a similar situation to what Markdown has been through and where possible I would prefer to go for breaking behaviour and errors will be spotted sooner rather than skipped over.

Do we really have that problem?

This was my theory around the formats above which should be treated as safe in most parsers where extra white space would be ignored around the comment.

I don't follow you. They don't have extra whitespace at the start of the line, so they won't be ignored even on legacy parsers.

Jan 07 '15 02:01 Leont

The point I am making is we are expecting the producer to be error free by always returning a not ok parent test. I would prefer if both producer and consumer were more fragile rather than the 'Just cope' attitude the philosophy suggests.

This was my theory around the formats above which should be treated as safe in most parsers where extra white space would be ignored around the comment.

I don't follow you. They don't have extra whitespace at the start of the line, so they won't be ignored even on legacy parsers.

That is my point, why would you want them ignored. I want a testing framework to do one of two things when it doesn't know what to do:

Fatal error
Add a parse error as if it were a failed test it encountered

Older parsers may or may not be able to cope with invalid subtests. The following will not fail with some of the parsers I have seen:

ok 1 - test
  not ok 1 - test
ok 2 - test

However both these would:

ok 1 - test
not ok 1 -    test
ok 2 - test

and

ok 1 - test
not ok 1 -- test
ok 2 - test

The way it was designed means that even if the subtests are ignored the correct conclusion can still be made. It leads to a data-loss, not not to incorrect operation.

This is unlikely to be true for all producers but I take your point that the parent test should fail however I would prefer the correct behaviour for parsers is that they fail.

In the parser I wrote a discrepancy generates a non-fatal parse error.

This is unlikely to be true for all consumers, in fact the TAP philosophy (which likely needs addressing) suggests any such issue should be skipped.

The only way I would support this with the subtests being before like @kinow mentioned. Even then the format isn't as easy to read.

Jan 07 '15 20:01 jonathanKingston

Older parsers may or may not be able to cope with invalid subtests. The following will not fail with some of the parsers I have seen:

Even as OKs they would fail because the numbering is incorrect.

This is unlikely to be true for all producers but I take your point that the parent test should fail however I would prefer the correct behaviour for parsers is that they fail.

Agreed. Parsing only the parent test should clearly be considered legacy behavior, but legacy has its uses.

This is unlikely to be true for all consumers, in fact the TAP philosophy (which likely needs addressing) suggests any such issue should be skipped.

My take on the philosophy would be: "report errors, but when possibly make them non-fatal", as that way the user can be presented with as much information as possible.

Jan 09 '15 02:01 Leont

Very true however subtests could be part of the same numbering potentially.

Is there other useful meta data that subtests have?

Yeah I will raise an issue also with that as I certainly don't think it is clear enough.

Jan 11 '15 17:01 jonathanKingston

@ovid I saw some interesting comments on the mailing list of yours. (http://www.ietf.org/mail-archive/web/tap/current/msg00530.html) Would you be able to mention your concerns here?

I believe a way of demarcating subtests is needed and it should satisfy these conditions:

[ ] Subtests be tested by older parsers as if it was a parent or fail in most cases
[ ] YAML doesn't become a nightmare to parse
[ ] Continue to be as human readable as possible
[ ] Allow for test numbering
- [ ] Allow for sub test counts 1..20
- [ ] Allow for sub test numbering 1.1.1 and 1

Jan 14 '15 02:01 jonathanKingston

Allow for sub test numbering 1.1.1 and 1

I'm not sure what you mean with that exactly.

Jan 15 '15 18:01 Leont

ok 1 - something
ok 1.1 -- something
ok 1.1.1 --- something
ok 2 - something

and

ok 1 - something
ok 1 -- something
ok 1 --- something
ok 2 - something

Obviously numbering itself should be optional also.

Jan 18 '15 19:01 jonathanKingston

@jonathanKingston: The approach has been suggested before and always shot down. The problem with this approach is that it is not backwards-compatible. Maintaining this backwards compatibility is a key design goal of TAP.

Jan 18 '15 19:01 Ovid

@Ovid it depends how you define backwards compatibility, therein lies my issues with the TAP philosophy. Where the new format includes further information about the tests, TAP consumers should of course always succeed.

However in this case we are transferring further tests, we should be doing our best to make all previous consumers to assume the tests are top level tests.

Subtests should in older consumers either:

Cause a parse error (Getting the consumer to upgrade their consumer)
Add in top level tests (Allowing test failures to be treated as failures if they are)

It would not be acceptable for the following to be treated as an overall pass in older consumers:

ok - test
    not ok - test
ok - test

What I think this means is that the choice between any subtest format is:

Complexity in future parsers surrounding YAML blocks and lack of ability of extra meta data
Risk of parse errors in older formats

Ultimately if backwards compatibility were a priority then the TAP version would not have been put in place in the first instance. In my opinion truly backwards compatible structures would not put in version strings.

The problem around backwards compatibility here restricts subtest numbering completely (However looking at the specification numbering is very loosely defined itself, it doesn't even state test numbers are incremental), by adding in numbering when we have the tests any numbering policy would be problematic also.

I think the benefit here is that we can define subtests without the risk of breaking older consumers - false negatives are better than false positives when it comes to tests themselves.

Jan 18 '15 21:01 jonathanKingston

Cause a parse error (Getting the consumer to upgrade their consumer)

This is not acceptable at all. Specially not in situations where the producer and the consumer are distributed separately. If the consequence of upgrading your producer is breakage, the most sensible decision is to not upgrade that producer. This sort of thing means people will refrain from upgrading to TAP 14.

Add in top level tests (Allowing test failures to be treated as failures if they are)

Both of your suggestions will cause a test harness to fail despite the tests all being succeeding. Again, if upgrading a producer causes the toolchain to fail when the tests pass, the sensible thing is to not upgrade the producer.

It would not be acceptable for the following to be treated as an overall pass in older consumers:

That is already the situation today. And it's not a major issues because it only happens on a faulty producer. In a green-fields implementation this would make sense, but given the reality we can't enforce this for legacy consumers.

Jan 20 '15 01:01 Leont

I'm sorry but I really am not seeing the adverse reaction to a testing framework that requires you to upgrade your consumer when you decide to want subtests.

All ideas have their pitfalls so far. Its a testing framework not a GUI tool or phone system we should not be trying to remedy old frameworks. Testing frameworks need to report errors not be ignored accuracy should be the key.

It is not really greenfield at all as some have gone with the suggested wiki post / mailing list. Some have gone with the tap4j backwards format. Some have invented YAML subdocuments and etc. The best we can do is cause errors and explain why by incrementing the version number.

The assertions made are that the TAP output is read by one consumer for the one producer and that won't always be the case.

I'm not precious to any of the suggested structures I have given at all however I am precious to allow the safest failures for older frameworks. You mention that test harness will fail and yes ideally we should look for a structure that has the least issues in the most popular consumers however as I keep saying we should not be worried about causing false positives, false negatives would be far worse.

Ideally I would prefer an implementation that breaks all old implementations to force the consumer upgrade if subtests are used.

If you were upgrading a framework of any kind in a corporate setting, the best is to assume that everything interacting with the framework will break. If we make TAP 14 compelling enough then slowly people will upgrade. Upgrades in all software needs to be compelling enough as likely enough there will be issues.

What I suggest is we keep looking, I really don't think it is ok that the best worst of the indentation is taken up. This causes YAML parsing issues for simplistic parsers.

I'm back to lets throw as many examples at consumers until we find the most that either:

can cope with the suggested format
blow up and cause upgrades

Can we at least keep looking/trying for a solution here, I think it is pretty obvious why no solution was ever chosen. Slumping for the first suggestion we know seems a little lacking.

Jan 20 '15 02:01 jonathanKingston

@jonathanKingston: I know this must be frustrating for you and I'm sorry about that. We're very, very cautious about this because TAP is the default test protocol used in the core Perl language. Perl is distributed by default with just about every major operating system except windows. This means that in just about every city on the planet with modern computers, Perl is there and thus TAP is there.

There is no way I can estimate the entire scope of breakage (many of those default installations will never be touched), but there have been serious issues in the past with accidental toolchain breakage causing an uproar. And then there's the ripple effect of this change spreading out to other language test harnesses which produce and consume TAP and there would be yet another uproar (and I'm pretty sure it would kill TAP or cause a fork).

I see no compelling reason for this change, but the downside is that it would be a disaster and devs in countries we've never thought of would be staring at their monitors, trying to figure out what happened. The first rule of the toolchain is that you do not break the toolchain (we Perl toolchain devs have been bitten enough times that we're learned our lessons).

As for how to handle YAML diagnostic information, I thought it was to be indented four spaces from the current line, but there's a potential problem there (the following is completely made up and just used as an example):

ok 1 - some test
ok 2 - some test
    1..3
    ok 1 - some subtest
    not ok 2 - some subtest
        ---
        have: 7
        want: 8
        line: 7
        file: t/foo.t
        extra:
               - |
                   This is a YAML here document and maybe it's
                   not ok to embed this in test diagnostics
    ok 3 - bummer
not ok - 3 - subtest summary line
ok 4
1..4

In the above example, I could easily see a poorly-written parser choking on the YAML heredoc. That will need to be specified carefully and I don't see how the suggested changes which break the parser would mitigate the above case.

To help us move forward on this, please show us concrete examples of:

what you want
why you want it
why the current parser(s) don't handle that case
how a proposed solution would do that.

Without seeing specific examples, we'll get this wrong.

As an aside, is anyone reading this who was also at the Oslo QA hackathon in 2008? We hashed out a lot of this there, but more importantly, there was discussion of creating "spec tests" in YAML or XML format that parsers written in any language should be able to parse and validate. That would allow TAP consumer authors to have a standard against which they could test. Does anyone recall what happened with that? I wasn't working on the spec tests at that event.

Jan 20 '15 08:01 Ovid

@Ovid

As for how to handle YAML diagnostic information, I thought it was to be indented four spaces from the current line

I also recall reading it somewhere, probably tap4j expects YAMLish in a indented block of text, starting with --- and ending with ...

Jan 20 '15 11:01 kinow

Since the specification has been frozen for a while, a TAP 14 which breaks backward compatibility could be dangerous and reduce the number of adopters, IMO.

Though I agree with @jonathanKingston that we need to move further and not be afraid of changes, at this time maybe it would be more sensible to release a specification that could be easily adopted by implementations, and would let us measure its level of adoption by implementations and gather new contributors and users.

@jonathanKingston in TAP 13 old website, I think there was a section Proposals. Maybe we could create similar page in the website with sections for TAP 14, TAP 15 and Backlog (or others) and write what we would be aiming to release with each new specification. Kind like a roadmap. What do you think?

Jan 20 '15 11:01 kinow

@kinow :+1: public product backlog spreadsheet prioritized for the TAP 14 draft sprint looks like useful deliverable

This need was already expressed using different words in Leont's comment, my comment, Ovid's comment

Where it should be hosted (GitHub Wiki, Google Docs, GitHub Issues, Trello, ...) and how would the review and voting processes look like is an unknown for me. I don't have corresponding positive open source experience and The IETF Process: An Informal Guide is not very digestible guideline

@jonathanKingston ?

Jan 20 '15 12:01 xmojmr

Thanks @xmojmr! Just created issue #10 for it, so that we can move this discussion there.

Jan 20 '15 12:01 kinow

@Ovid to be clear it is not me who actually wants this feature, however it would be a useful separation of the TAP output to see categorisation of the issues.

What I am trying to prevent is causing more issues for the future of TAP by adding in the simplest option we can pick from which is whitespace indenting. Yes YAMLish formats have the indicators but I would prefer parsers to be able to ignore all indented text unless it spots either bail out or not ok.

The example code you gave explains my distaste in standardising the old proposal of indenting the subtests, I want to keep parsing rules as simple as possible.

I would be interested in seeing your idea around "spec tests" even if it is from memory and not exactly how it was suggested. As I said I'm not really suggesting anything besides keeping the TAP format simple whilst allowing flexibility.

@jonathanKingston: I know this must be frustrating for you and I'm sorry about that. We're very, very cautious about this because TAP is the default test protocol used in the core Perl language. Perl is distributed by default with just about every major operating system except windows. This means that in just about every city on the planet with modern computers, Perl is there and thus TAP is there.

I really don't buy this statement sorry, software can be updated carefully. Updating specs is possible even in the most widely used languages. I get that breaking changes will happen but that is the risk of updating the producer. This is the same reason why old browsers are prevalent and PHP 4 in enterprise setups.

A fork has ultimately happened already happened with TAP-Y etc, without progressing the format to include the latest user requirements adoption will slowly die out. If TAP is so ingrained into the Perl toolchain that it can't be extended then ultimately it shouldn't have been opened up outside of the libraries that consumed it. If the original tools that created TAP want to remain using the older version, that is completely fine also. I think the only way to progress is as @xmojmr suggested creating guidelines for extending TAP and a framework where people can vote on these issues.

@kinow I agree TAP 14 should just be a clean up of the spec where possible breaking backwards compatibility should be 15 (unless we opt for point increases which isn't a bad idea either).

@xmojmr yep moved all talks of a roadmap to #10 thanks for all the previous comments on the previous thread also :+1:

Jan 20 '15 22:01 jonathanKingston

The example code you gave explains my distaste in standardising the old proposal of indenting the subtests, I want to keep parsing rules as simple as possible.

I think that is a very reasonable POV, I just don't think your suggested resolutions are workable. I'm sure there are syntaxes that would be ignored by a TAP12 consumer but are easier to parse for a TAP14 consumer than the current mechanism. This is a solvable problem. For example.

case A subtest {
    ok 1 - A
    ok 2 - B
}
ok 1 - A subtest

or

ok 1 - A subtest {
    ok 1 - A
    ok 2 - B
}

Jan 21 '15 00:01 Leont

@Leont again I am back to thinking it doesn't need to be ignored so long as there is a failure but lets not get into that.

Both those examples then need the parser to be a node based parser rather than a simple one test per line that starts with /^(not|ok)/. If I am building a parser that complex then I can filter out the YAMLish start and end.

Prefixing with a character might be easier (despite not meeting my desire to have subtests parsed by older consumers):

ok 1 - parent
> ok 1 - child
> ok 2 - child
> > ok 1 - child

I'm not a massive fan of that either but it's certainly simpler than context based parsing.

Jan 21 '15 01:01 jonathanKingston

I know I am late to the party, but @Leont just looped me in (kind of). Before I knew this discussion was going on, in fact it was early into my adoption of Test-Simple, I decided to build something like this in as an option into Test::Builder. As it stands the dev versions have the following as a subtest syntax you can turn on (default is the classic)

ok 1 - not a subtest
ok 2 - a passing subtest {
    ok 1 - inside a subtest
    ok 2 - inside a subtest
}
not ok 3 - a failing subtest {
    not ok 1 - failure inside
    # diag stuff
}

This doesn't break any of the harnesses/parsers I have used it on.

Please do not take this available syntax as me going cowboy or trying to preempt anyone here. I did this before I knew there was an ongoing process or other people to discuss it with. The main takeaway should be that the dev releases now make it possible to support a new subtest syntax if one is decided. Also it is now possible to put the subtest line before the results from inside the subtest if that is necessary.

Feb 03 '15 15:02 exodist

I'm still back to sub tests would be better not indented if possible and consumers would have simpler memory usage if a state wasn't required to know the nesting.

The YAML causes a simplistic consumer to need to check for the YAMLish lines rather than just ignore indented text.

The context based approaches are better avoided (crude but similar to):

ok 1 - thing (1) {
ok 1 - sub thing
}

@exodist thanks for the heads up, I would like to make sure the format is correct before it becomes close to being in TAP14. Like you mentioned in the other ticket it would be easy to miss the progress in this repo.

Feb 04 '15 01:02 jonathanKingston

I'm still back to sub tests would be better not indented if possible and consumers would have simpler memory usage if a state wasn't required to know the nesting.

I agree.

I've seen some sentiment about wanting backwards compatibility, but I really don't understand that requirement. Obviously you'd upgrade your consumer before your producers, so I don't think the design should be limited by that.

That being said, I think I like this approach:

ok 1 - thing
ok 1.1 - sub thing
ok 1.1.1 - sub sub thing
ok 1.2 - sub thing 2

If you don't want to number your tests, perhaps this would work:

ok - thing
ok . - sub thing
ok .. - sub sub thing
ok . - sub thing 2

The context based approaches are better avoided (crude but similar to):

Exactly. This can get nasty with several levels of nesting (assuming no indentation):

ok 1 - thing (1) {
ok 1 - sub thing (2) {
ok 1 - sub sub thing {
}
ok 2 - sub sub thing {
}
}
}

If we are going with something that requires context, I would much prefer indentation as it's much easier to read.

Feb 04 '15 04:02 beatgammit

@exodist has offered pretty much the first syntax I've seen which does not break backwards-compatibility, but it has some issues (which I'll cover in the moment).

Many suggested alternatives break backwards-compatibility because:

Test numbers must be integers
Or trying to shoe-horn subtests into the top-level (i.e., no indentation) which will break leading and trailing plans and also break the requirements that test numbers be sequential

In other words, millions of programs all over the planet which might parse TAP could fail, even though the tests pass and the software works. This is not a tiny project which a handful of fans have adopted. If there is a desire to break backwards-compatibility (and that would create a sh*tstorm), please:

Explain why the breakage is necessary
Explain why the new syntax is better (and "I think it looks nice" isn't enough)
Lay out the pros and cons of both approaches

Some of that has been attempted above, but the theoretical discussions I've seen are far from compelling. There's been some discussion about leading characters other than spaces and that also confuses me a bit: why is indentation OK so long as it's not spaces? (That's an honest question. I would love to understand the rationale and I probably misunderstood something here).

Regarding the syntax Chad has proposed:

ok 1 - not a subtest
ok 2 - a passing subtest {
    ok 1 - inside a subtest
    ok 2 - inside a subtest
}
not ok 3 - a failing subtest {
    not ok 1 - failure inside
    # diag stuff
}

I don't see any leading or trailing plans. How does parser know that only two tests were supposed to be run in the first subtest and not three tests? The trailing curly brace doesn't say, but an asserted plan does say.

The other issue I have with this is more subtle.

When I added subtests, I put the subtest summary test after the subtest. Aesthetically I prefer Chad's version, but the summary test came after for one simple reason: you block on test output. I worked on one system which used a hideous XML format to encode test output. Your test run might take a long time, but even if the first test failed you wouldn't know because the parser had to wait until the entire document loaded (often the entire test run) to see the failure. This destroyed the rapid test/hack/test/hack/test/hack cycle. You ran a test and went to get coffee. TAP is deliberately line-oriented, not documented oriented. You can easily stream lines. Documents? Not so much.

If the summary line is presented first, the entire subtest has to run but its output blocked because you have to finish the subtest, determine if it passed, print the summary TAP line and then flush the output. If subtests contain subtests (and I'm currently working on a system for which deeply nested subtests are the rule rather than the exception), you can have huge chunks of TAP output blocked, waiting for all of the subtests to pass. This destroys the streaming nature of TAP.

Feb 04 '15 09:02 Ovid

@Ovid, first thing, the plan, I hand-wrote the example in my last post, and I forgot to include a plan, but the syntax as it is implemented does handle plans.

I also must state that I was not proposing that syntax, I was just mentioning we already have it due to my ignorance of this debate.

Moving the final result to the top had very little to do with aesthetics. The reason I put it at the top was actually to solve the jumbled output problem that occurs when tests are run in a multi-thread/multi-proc way. If you have 3 or 4 child processes producing results, and they use subtests, the output becomes less than useful. For a singlet-process/thread test it makes sense to use the old subtest syntax, but it simply does not work when you add concurrency to the mix.

I have been working with unit tests that support concurrency for 5+ years now, first with Fennec, and now with the baked in fork/thread support in Test-Simple. Subtests as they are currently are simply insufficient when concurrency is involved. Delaying the output of results within a subtest, and displaying them with the final result is the most sane way I found.

If the TAP is intended for a machine then the delay doesn't matter, the machine has a lot more patience than a human. If the TAP is intended for a human then it needs to be fast, but it also needs to make sense, and subtests where the final result comes last have proven to be very unfriendly to humans who have frequently expressed confusion.

Finally, the way Test-Simple implements it you can actually have BOTH, it is possible to tell it to enable both forms:

ok 1 - not a subtest
    ok 1 - inside subtest
    ok 2 - also inside
ok 2 - final subtest result {
    ok 1 - inside a subtest # again
    ok 2 - also inside # again
}

This lets you see results inside subtests as they happen, but once completed you can see it in full context. A harness could be trained to ignore indented results outside of brackets, thus letting us use this dual form to provide instant feedback for humans, but useful and contained data for both machines and humans. If we wanted to make it less ambiguous we could prefix the indented ok's that occur outside the blocks with something:

ok 1 - not a subtest
    s'ok 1 - inside subtest
    s'ok 2 - also inside
ok 2 - final subtest result {
    ok 1 - inside a subtest # again
    ok 2 - also inside # again
}

*Note: Once again I did not put the plan in there anywhere, that was simple lazyness.

Feb 04 '15 16:02 exodist

I hesitate to chime in here, but... as @Leont said in one of the very first comments:

We really want to have this. I'm not fond of the current mechanism, but given it's out there and there's a lot of stuff outputting it already it may be a passed station.

Sure, the current subtest output is arguably not ideal, but it's there and shown to work. It includes a plan, doesn't conflict with the current TAP spec, and is wonderfully simple. That is: the subtest is just another TAP; strip the indent and feed it to a parser and it should work nicely.

ok 1 - hi!
    # Subtest: a subtest
    ok 1 - la al al
    ok 2 - fo fo fo
    1..2
ok 2 - a subtest
1..2

Is there some benefit that we would gain by rejecting this simple, logical -- and in use -- extension in favor of some other deeper change?

Feb 04 '15 16:02 rsrchboy