go
go copied to clipboard
proposal: x/tools/go/analysis: allow Analyzers to customize their behavior if a file is a test file
Proposal Details
Some analyzers need to customize their behavior based on the context in which the code is running, e.g. to restrict certain (inefficient) APIs to only be used in tests.
At the moment analyzers have no way to determine if the file they are analyzing is only used in tests. One way of achieving this today is to check if the file ends in _test.go. This works for the most part. However it fails for libraries that are not tests themselves but can only be used in tests. Some build systems can enforce this. See bazels testonly attribute.
One way to support this would be similar to the version support based on go/types.Info.FileVersions which is a map from *ast.File to the version string. A map in the analysis.Pass from *ast.File to a bollean that says if the file is only used in tests or not would solve this problem.
Note: it is not a property of the package to be analyzed but the individual files because when analyzing a test package it consists of test files and the actual package source files.
cc: @adonovan @timothy-king @findleyr
This feature would first need to be added to x/tools/go/packages, which the multichecker driver uses for package information. So let's expand this proposal to consider the API changes to both packages at once.
(Reopening. Fat fingers closed this.)
3 candidate implementations:
- Add a field
Test map[*ast.File]booltoanalysis.Passandpackages.Package(original proposal) - Add a field
Test func(*ast.File)booltoanalysis.Passandpackages.Package - Add a field
Tests []*ast.Filetoanalysis.Passandpackages.Package
Thoughts:
- 1 and 2 both strike me as user friendly for analyzer writers.
- I have a sneaking suspicion that packages will be slightly more natural to express with 3. Packages can be given a list of the testing files and forwards it along.
- 3 might be slightly inefficient if analyzers do O(|#test files||#files||#analyzers|*|#packages|) to check these, but this is easily to fix with a utility Analyzer if this becomes a problem.
- Options 2 strikes me as not a natural fit for
packages. - I do not see a particularly compelling reason to work over strings instead of
*ast.File(packages has a few places strings appear though).
A utility analyzer is overkill; we should just get the API right. The client needs to know "is this file a test file?" or, more often, "is this node or position within a test file?", even when they don't have the File to hand. The API should support those queries directly.
I suspect the frequency of queries will not be very great: typically once per file (if deciding to skip some files) or once per finding (if deciding whether to suppress a diagnostic). So this suggests a representation like option #3, with a helper function such as:
// InTestFile reports whether the position is within
// (including EOF) a file that is only used during testing.
func (pass *Pass) InTestFile(pos token.Pos) bool {
for _, f := range pass.TestFiles {
if f.FileStart <= pos && pos <= f.FileEnd { return true }
}
return false
}
Clients can build a hash table if they need more frequent access.
I might be missing something but is there a reason this needs to be part of the analysis.Pass at all? Is this to support files that are not part of the package under active analysis? Otherwise it should be enough to have this information in the types.Package or types.Info which is always available for the package under analysis (even for packages that fail type checking it should be possible to fill in this data).
Or going the other direction. Does it need to be part of the types.Package or types.Info at all? Wouldn't it be sufficient to be part of the analysis.Pass only? (It might be hard for analysis drivers to get at this information).
I think what I'm trying to say is: is there a reason to duplicate this information?
I think what I'm trying to say is: is there a reason to duplicate this information?
Anything "input" data exposed by analysis.Pass interface needs to be derivable from packages.Package, since the design of multichecker driver relies on it. So if the Pass needs access to a "test-only file?" predicate, this information must be furnished by packages.Package (and also by go vet, but that's easier because the vet-unitchecker interface isn't a public API).
If types.Package provided this information that would be solution, but all existing creators of types.Package---that is, all clients of types.Checker, plus cmd/compiler + gcimporter, do not provide it.
func (pass *Pass) InTestFile(pos token.Pos) bool
The interface I proposed above assumes that files are parsed (which is the common use-case for go/packages, but isn't actually required), so this operation would be dependent on NeedSyntax. One alternative is to provide the list of names of test-only files, but it seems less useful.
A utility analyzer is overkill; we should just get the API right. The client needs to know "is this file a test file?" or, more often, "is this node or position within a test file?", even when they don't have the File to hand. The API should support those queries directly.
~We went through a very similar line of reasoning when discussing file version information (https://go.dev/issue/62605). I still feel that it is philosophically incorrect to use token.Pos to look up information about the file, because the syntax tree is abstract and any properties of the file are properties of the File node. I think map[*ast.File]bool is correct.~
EDIT: on second thought, I'm not sure. Unlike go/types, for which the input may realistically be a mutated AST, go/packages guarantees an AST produced by go/parser, for which a positional lookup is sound. Do we want to establish a requirement that analyzers operate only on syntax with full and accurate positions?
If types.Package provided this information that would be solution, but all existing creators of types.Package---that is, all clients of types.Checker, plus cmd/compiler + gcimporter, do not provide it.
Maybe I got confused by @timothy-king comment above but are you now suggesting a solution that does not involve adding this information to go/types?
Because if it is added to go/types (e.g. types.Info), that would be sufficient, wouldn't it? It is file the FileVersions so it should be sufficient for the "isTestFile" as well.
Maybe I got confused by @timothy-king comment above but are you now suggesting a solution that does not involve adding this information to go/types?
I wasn't suggesting that go/types be involved. Unlike FileVersions, which is critical information for the type checking algorithm, the "test only" predicate is not relevant to type checking and is purely a property of the build system. So I think this information belongs in go/packages, and that's what I interpreted Tim and Rob to be implying too.
Sorry I misread @timothy-king comment. It refers to packages.Package. I somehow read this as types.Package. Sorry for the confusion.
I think a analysis only solution is appropriate here.
I think a analysis only solution is appropriate here.
See my note in https://github.com/golang/go/issues/65749#issuecomment-1956808531: it has to be in both go/analysis and go/packages.
Do we want to establish a requirement that analyzers operate only on syntax with full and accurate positions?
I am not really sure. Might need its own proposal for the other drivers for analysis.Analyzer out there?
But I feel like we do not need to resolve this now in order to know which files are tests. If we have a field pass.TestFiles, we can always add astutil.Within(pos token.Pos, files []*ast.Files) bool (what Alan wrote just not a method on Pass) without an API commitment. My point is once the field is there and filled in, we can figure out an ergonomic API.
But I feel like we do not need to resolve this now in order to know which files are tests.
Agreed. I prefer not resolving this now, by treating the position lookup as a separate concern. I'd be fine with e.g. packages.Package.TestFiles []string and analysis.Pass.TestFiles map[*ast.File]bool.
Or should we have analysis.Pass.TestFiles []string, completely passing through the equivalent field on packages.Package? The analysis.Pass has OtherFiles and IgnoredFiles, both of which are []string. Why should it be that test files only exist among the set of compiled go files? Presumably there could be a test file that is not compiled.
I'd be happy with lists of strings (filenames), parallel to TestFiles and OtherFiles. That avoids the tricky interaction with the NeedSyntax mode bit.
Ok, it sounds like we're honing in on:
For go/packages:
package packages
type Package {
// TestFiles lists files that are only used for test code, e.g. because they contain tests
// (as in a *_test.go file), or are otherwise marked by the build system as being restricted
// to testing.
TestFiles []string
// ...
}
For go/analysis:
package analysis
type Pass {
TestFiles []string // names of test-only files in this package
// ...
}
Does that look right?
This doesn't look right to me. TestFiles has a well-understood meaning already: *_test.go files. If a build system has the concept of a package that can only be imported by tests, that doesn't make the files "TestFiles", since in general code in TestFiles is not importable at all. It's also weird that Pass.Files is a []*ast.File while Pass.TestFiles is a []string. What kind of string is it? Is it just the basename (like it would be in go/packages) or is it the fully qualified path that you might get from the FileSet?
If we have to add this foreign concept for use with Blaze/Bazel, it seems cleaner to add it directly than to disguise it in a way that ends up being awkward to explain in practice. Specifically, let's add Package.TestOnly bool and Pass.TestOnly bool
TestOnly bool // package is marked as "test-only" in the build system (not possible using go command)
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
_test.go files in non-test packages are sufficiently similar to bazel testonly packages that I think most analysis would want to treat them the same. Each analyzer inferring 'is this file a test?' by combining TestOnly, the *ast.File, and the FileSet seems a bit less natural than the build system telling the analysis which subset of files are tests.
I disagree. Putting them in TestFiles makes it sound like they are _test.go files, and they aren't. I think you are overfitting to this one specific use case. Better to expose the actual concept than the bit you want about that concept today.
@timothy-king & @lfolger, does the TestOnly bool API address your needs?
It is not entirely clear to me if TestOnly would be set to true for tests themselves. If so, it wouldn't work because tests might consist of test and non-test files. If such a package would be marked TestOnly, we would consider all of the embedded library sources as TestOnly.
If TestOnly is only set for blaze/bazel go_library targets with testonly attributte but not for go_test targets, it would work for us.
I don't have any immediate plans to have checkers that use this. I'll defer to @lfolger for whether it meets his needs.
Based on @lfolger's recent message, it sounds like https://github.com/golang/go/issues/65749#issuecomment-1973571611 works for everyone. This would mean adding a TestOnly bool to x/tools/go/packages, but the cmd/go integration would never emit it. Only Bazel/Blaze wrappers would emit it. Do I have that right?
A test only bit like the one described is interesting, but it's slightly worrying that it would only be used by Bazel and Blaze (Blazel?) builds. OTOH, I am wondering what we could do if we were able to pass more build system information through to analyzers. For example, dependency information is very different between Blazel and Go and understanding the dependency information on the build system level would allow us to replace some heuristics with the source of truth data.
What if, instead of handling the case of test only information, we had an extension mechanism that could transport this and other information? What other problems could we fix with that?
It stops being an API if arbitrary data can be passed between Blaze and analyzers. Let's add what we need and document what it means. TestOnly seems like a reasonable step on that path.
Have all remaining concerns about this proposal been addressed?
The proposal is to add
TestOnly bool // package is marked as "test-only" in the build system (not possible using go command)
to both golang.org/x/tools/go/packages.Package and golang.org/x/tools/go/analysis.Pass.
This is a concept in Blaze, Bazel, and other build systems but not the go command.
This would work for our use case.
Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group
The proposal is to add
TestOnly bool // package is marked as "test-only" in the build system (not possible using go command)
to both golang.org/x/tools/go/packages.Package and golang.org/x/tools/go/analysis.Pass.
This is a concept in Blaze, Bazel, and other build systems but not the go command.