rules_go
rules_go copied to clipboard
nogo: exclude_files should accept anchored regexps
What version of rules_go are you using?
master (31d17212968b472032035a45353a1751f1c0f529)
What version of gazelle are you using?
0.26.0
What version of Bazel are you using?
Build label: 5.2.0
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Tue Jun 7 16:02:26 2022 (1654617746)
Build timestamp: 1654617746
Build timestamp as int: 1654617746
Does this issue reproduce with the latest releases of all the above?
Can't get newer than this.
What operating system and processor architecture are you using?
Linux / amd64
Any other potentially useful information about your toolchain?
Custom CROSSTOOL; llvm_toolchain based on https://github.com/grailbio/bazel-toolchain 0.7.2.
What did you do?
I was configuring nogo to skip external dependencies; using staticcheck and with a (generated) config file like:
{"QF1001": {"exclude_files": {"[.]pb[.]go$": "generated","external/": "out of our control"}}, ...
Anyway, I wanted to match on ^external/ and it didn't work, so I cloned rules_go, changed my http_repository to a local_repositiory pointing at my clone, and began adding some print statements to nogo_main.go:
var ranRegex bool
if include {
for _, pattern := range config.excludeFiles {
ranRegex = true
fmt.Printf("%v on %v %v\n", pattern, filename, []byte(filename))
if pattern.MatchString(filename) {
fmt.Printf(" ==> exclude\n")
include = false
break
}
}
}
if ranRegex && include {
fmt.Printf(" ==> include\n")
}
I was then surprised to see that the string representation and the byte representation of file are completely different:
compilepkg: [.]pb[.]go$ on external/com_github_uber_jaeger_client_go/log/logger.go [47 104 111 109 101 47 106 114 111 99 107 119 97 121 47 46 99 97 99 104 101 47 98 97 122 101 108 47 95 98 97 122 101 108 95 106 114 111 99 107 119 97 121 47 97 101 56 48 53 49 51 54 53 50 55 53 51 50 98 57 55 56 98 102 101 50 49 100 52 56 100 51 99 56 98 54 47 115 97 110 100 98 111 120 47 108 105 110 117 120 45 115 97 110 100 98 111 120 47 50 50 52 56 56 47 101 120 101 99 114 111 111 116 47 95 95 109 97 105 110 95 95 47 101 120 116 101 114 110 97 108 47 99 111 109 95 103 105 116 104 117 98 95 117 98 101 114 95 106 97 101 103 101 114 95 99 108 105 101 110 116 95 103 111 47 108 111 103 47 108 111 103 103 101 114 46 103 111]
^external/ on external/com_github_uber_jaeger_client_go/log/logger.go [47 104 111 109 101 47 106 114 111 99 107 119 97 121 47 46 99 97 99 104 101 47 98 97 122 101 108 47 95 98 97 122 101 108 95 106 114 111 99 107 119 97 121 47 97 101 56 48 53 49 51 54 53 50 55 53 51 50 98 57 55 56 98 102 101 50 49 100 52 56 100 51 99 56 98 54 47 115 97 110 100 98 111 120 47 108 105 110 117 120 45 115 97 110 100 98 111 120 47 50 50 52 56 56 47 101 120 101 99 114 111 111 116 47 95 95 109 97 105 110 95 95 47 101 120 116 101 114 110 97 108 47 99 111 109 95 103 105 116 104 117 98 95 117 98 101 114 95 106 97 101 103 101 114 95 99 108 105 101 110 116 95 103 111 47 108 111 103 47 108 111 103 103 101 114 46 103 111]
==> include
So I guess as a []byte the filename is the full path, [47 104 111 ... is something like /home/jrockway/.... If you for i, r := range filename { fmt.Printf("%d %v\n", i, r) } you also get the full path. This is basically what the regexp engine eventually does, so that explains why a match on ^external fails.
Having said that, it's weird that as a string, filename starts with external/.
I dug into go/token which is where this string comes from, and it's not doing any weird slicing or anything, it's pretty much strings from the very start. So I have no idea where these paths are coming from. (Also tested some weird slicing and couldn't get this behavior. x := []byte("foobar"), y := string(x[1:2]), z := []byte(y) prints what you'd expect (z isn't the original slice, it's identical to y).
Anyway, maybe I totally misunderstand how []byte/string conversion works in Go, but if this is intentional, it blocks matching on ^external, which is annoying. But I think there's a bug in the analysis chain somewhere, rather than a concerted effort to make the anchored regex match fail. The question is... what is the bug.
More versions
This is go 1.18.4; verified with runtime.Version() in nogo_main.go. (That's also my system version of Go, and the configured toolchain.)
golang.org/x/tools might also be relevant:
go_repository(
name = "org_golang_x_tools",
importpath = "golang.org/x/tools",
sum = "h1:loJ25fNOEhSXfHrpoGj91eCUThwdNX6u24rO1xnNteY=",
version = "v0.1.11",
)
This sounds strange. Are the paths perhaps going through https://github.com/bazelbuild/rules_go/blob/31d17212968b472032035a45353a1751f1c0f529/go/tools/builders/env.go#L342?
Anyway, maybe I totally misunderstand how []byte/string conversion works in Go, but if this is intentional, it blocks matching on ^external, which is annoying. But I think there's a bug in the analysis chain somewhere, rather than a concerted effort to make the anchored regex match fail. The question is... what is the bug.
You can use go playground to quickly validate how Go regex works https://go.dev/play/p/lDKWy6Ainyp
I think you are running into a different issue though 🤔