test Include whether a "test" is runnable in JSON output

When I run a test group that has a setup method, I get a fake test in the JSON output that looks just like any other test:

// Setup call
{
	"test": {
		"id": 4,
		"name": "group line 4 (setUpAll)",
		"suiteID": 0,
		"groupIDs": [ 2, 3 ],
		"metadata": { "skip": false, "skipReason": null },
		"line": 15,
		"column": 5,
		"url": "file:///Users/danny/Desktop/dart_sample/test/dc4163/a_test.dart"
	},
	"type": "testStart",
	"time": 572
}
// Real test
{
	"test": {
		"id": 5,
		"name": "group line 4 test line 5",
		"suiteID": 0,
		"groupIDs": [ 2, 3 ],
		"metadata": { "skip": false, "skipReason": null },
		"line": 23,
		"column": 5,
		"url": "file:///Users/danny/Desktop/dart_sample/test/dc4163/a_test.dart"
	},
	"type": "testStart",
	"time": 1615
}

In VS Code, we use this data to populate the test tree, and this means the setup/teardown methods appear. This is generally good, you can see the failure in a setup/teardown method:

Screenshot 2022-11-29 at 10 49 29

However, it also means the user can click to "run" these tests, which fails because there is no test called (setUpAll).

For now, I'm going to use the names of these tests to mark them as not-runnable (which hides the run icons from those tree nodes), but it would be better if pacakge:test included this info so it doesn't rely on string checks of test names.

(note: there is a slightly similar niggle with the "loading xxx_test.dart" fake tests... those we filter out from the tree entirely, but it's also a name check and it would be nice if that could also be done in a more structured way... there's already a metadata section above, maybe both of them could be additional flags in there?)

Nov 29 '22 10:11 DanTup

Looks like there's code here doing similar checks that would maybe benefit from this too:

https://github.com/dart-lang/test/blob/fd8e2b68ab1bc56548ab41269e1728096648590b/pkgs/test_core/lib/src/runner/reporter/github.dart#L120-L123

Nov 29 '22 10:11 DanTup

Should we have a kind enum to separate between all the types of tests? Or just a boolean for whether it is truly a "test" function?

Nov 29 '22 15:11 jakemac53

Generally I think having more information is better. In the example above, the "loading" and "setup" tests are treated differently by VS Code (we show the setup/teardown, but we don't show the loading one). Although, IIRC maybe the loading one already has a hidden flag or something.

It doesn't necessarily have to be an enum though, it could be a set of flags (for example whether an is something that can have visible results and should be shown, and whether it's directly runnable). That could convey a little more info because the names of the flags explain the reason for them, whereas an enum would require some logic in the client to know what each value really means.

Nov 29 '22 15:11 DanTup

I think it might be best for the client to make decisions about what to do with each though? setUpAll and tearDownAll are kind of weird - they can have expectations etc and are effectively "tests". Even if users don't think of them as such. Theoretically you could "run" them (if we supported it, which I doubt?). I think it probably makes sense for the consumer of the protocol to make decisions about how to handle them, as opposed to the protocol making the decisions?

Although a "runnable" boolean flag might make sense, as that is informative about what can be done with the test.

Nov 29 '22 15:11 jakemac53

I think it might be best for the client to make decisions about what to do with each though?

Yeah, I guess with the flags I meant that explain "logically" what the things are, rather than how the client should handle them. Having flags allows for overlaps without the enum exploding, and I think would be easier to extend (and simlify clients supporting multiple versions of pkg:test if the enums change over time). Things like isRunnable, isSetup, isTeardown, isLoading could be useful.

And if in future, setup/teardown became runnable individually, you could change the isRunnable flag and it would just work - whereas if it was an enum value, the IDE would need to track in which versions of package:test the enum value means one thing versus the other.

Nov 29 '22 15:11 DanTup

Having flags allows for overlaps without the enum exploding

It seems to me like the booleans would explode where the enum wouldn't? It is one field for the kind (setUpAll, teardDownAll, loading, test), as opposed to a boolean for each of those. Granted we could omit the ones that are false from the protocol, but an enum seems simpler to me.

Adding the kind enum + runnable boolean fields seems like a good middle ground, only 2 additional fields, one with a useful semantic reasoning for clients (isRunnable) and the other with a more descriptive meaning which the client may choose to do something with or not.

Note that I think it is only setUpAll and tearDownAll which we actually surface as separate tests, setUp and tearDown run per test and I think we roll them up into the normal test failure. We only treat the others specially because there is no single test to attach them to, they run once per suite.

Nov 29 '22 15:11 jakemac53

It is one field for the kind (setUpAll, teardDownAll, loading, test), as opposed to a boolean for each of those

I was thinking about where the values may overlap. I don't have any concrete cases for package:test, although the "runnable setUpAll" is an example of where it complicates things for the client. Let's say the enum has enum TestKind { setUpAll } and today that means it's not runnable, but in future it becomes runnable - the editor needs to keep track that in v1 setUpAll was not runnable and in v2 setUpAll is runnable, whereas a "runnable" flag is self-explanatory and doesn't require any encoding of what enum values mean in the client that could change over time.

But I think you're right, maybe both make sense here. The kind of a an item (loading marker, setup, teardown, test) is a fairly closed set (and if new ones are added, maybe the client wants to handle that anyway). It seemed a little odd to use a flag for one thing (is this thing runnable?) and an enum for another (is this test something that the user would recognise as something in their code - eg. not a loading placeholder).

Note that I think it is only setUpAll and tearDownAll which we actually surface as separate tests, setUp and tearDown run per test and I think we roll them up into the normal test failure.

Yeah, I discovered that while testing my changes. I think there could be value in surfacing them separately (to make it clearer why a "test" failed), but that's probably a topic for another day (it also may introduce tests-nested-inside-tests which I suspect is a new concept 😄)

Nov 29 '22 16:11 DanTup

Unless we have a known use case today for the enum I think I'd prefer to stick to the single bool.

Nov 30 '22 01:11 natebosch

Having some indicator (whether it's a flag or an enum) to remove the hard-coded checking of "loading ..." would be good too. Today if you create a test that is named like this, it might not be handled correctly.

Nov 30 '22 10:11 DanTup