buf
buf copied to clipboard
Split parsing and linking
I was facing an issue with different behavior of the linter on different machines (same buf version, tested with 1.4.0 and some older versions).
The linter error that would be skipped on some machines was similar to:
app/v1/file_a.proto:20:40:field app.v1.ext_field: duplicate extension: app.v1beta1.ext_field and app.v1.ext_field are both using tag 12345
After additional investigation I saw that the linter issue is always reproduced when GOMAXPROCS=1
but is skipped when the default is used.
I believe the problem is because of the parallel processing that splits the input into chunks in:
https://github.com/bufbuild/buf/blob/main/private/bufpkg/bufimage/bufimagebuild/builder.go
which uses Parser
from github.com/jhump/protoreflect/desc/protoparse which has a local linker
that holds a map of extensions. If the two input files that contain the same extension tag would happen to be in different chunks processed in parallel the linting problem will not be detected.
Can you provide a reproducible example?
Attaching reproducible sources: buf_1150.tar.gz
Running buf with default config but different GOMAXPROCs:
$ buf --version
1.4.0
# in my env GOMAXPROC is 8
# when GOMAXPROC > number of input files
$ buf lint
app2/v1/ext2.proto:8:25:field app2.v1.bar2: duplicate extension: app1.v1.bar and app2.v1.bar2 are both using tag 126
# similar to the above
# but GOMAXPROC is hardcoded to 1
$ GOMAXPROCS=1 buf lint
app2/v1/ext2.proto:8:25:field app2.v1.bar2: duplicate extension: app1.v1.bar and app2.v1.bar2 are both using tag 126
# when the process has a chance to split the input into different goroutines
# when 1 < GOMAXPROC <= number of input files
$ GOMAXPROCS=3 buf lint
# ^ no error detected
Reproducibility of the bug depends on the logic in getChunkSize()
in private/bufpkg/bufimage/bufimagebuild/builder.go
as well as how the chunks are split. So I used GOMAXPROCS equal to the number of input files/directories, because that makes the bug always reproducible. When GOMAXPROCS is lower than number of input files (typical scenario with decent sized codebases), the occurrence of the bug depends on order of the files on the filesystem.
This is going to take a bit to solve, we're likely going to have to split linking into a separate step. Thanks for the reproduction.
I believe this issue is fixed now in 1.9 due to the protocompile
upgrade in #1463, am I correct?
I checked the demonstration just to see:
$ buf --version
1.9.0
$ GOMAXPROC=3 buf lint
app1/v1/ext1.proto:8:24:extension with tag 126 for message common.v1.Foo already defined at app2/v1/ext2.proto:8:25
cc @jhump
@jchadwick-buf, yes, I can confirm that this is fixed in 1.9.0.
The issue has to do with how parallelism was achieved in earlier versions of buf: the list of all files in the module would be broken up into "chunks". The number of chunks was based on the level of parallelism (e.g. GOMAXPROC
). Each chunk would be compiled independently, all concurrently. That means that symbol and extension number conflicts between two files that were processed in separate chunks would not be observed or reported. Setting GOMAXPROC
to 1 runs all files in a single operation, which is why the collisions are reliably reported in that configuration.
In v1.4.0, an improvement was made that attempted to solve this heuristically: all files in the same folder would always be included in the same "chunk". If the repo is laid out per best practices, where each folder is a separate proto package, then this alleviates the symbol collision issue. But it still had an issue with identifying extension number collisions.
In the new compiler (introduced in v1.9.0), the files are not broken into chunks. Instead, the new compiler natively supports concurrency and handles the entire set of files in a single operation. That way, it reliably reports all collisions even when using a high level of concurrency internally.