buf icon indicating copy to clipboard operation
buf copied to clipboard

Split parsing and linking

Open mvitaly opened this issue 2 years ago • 3 comments

I was facing an issue with different behavior of the linter on different machines (same buf version, tested with 1.4.0 and some older versions).

The linter error that would be skipped on some machines was similar to: app/v1/file_a.proto:20:40:field app.v1.ext_field: duplicate extension: app.v1beta1.ext_field and app.v1.ext_field are both using tag 12345

After additional investigation I saw that the linter issue is always reproduced when GOMAXPROCS=1 but is skipped when the default is used.

I believe the problem is because of the parallel processing that splits the input into chunks in: https://github.com/bufbuild/buf/blob/main/private/bufpkg/bufimage/bufimagebuild/builder.go which uses Parser from github.com/jhump/protoreflect/desc/protoparse which has a local linker that holds a map of extensions. If the two input files that contain the same extension tag would happen to be in different chunks processed in parallel the linting problem will not be detected.

mvitaly avatar May 21 '22 06:05 mvitaly

Can you provide a reproducible example?

bufdev avatar May 24 '22 15:05 bufdev

Attaching reproducible sources: buf_1150.tar.gz

Running buf with default config but different GOMAXPROCs:

$ buf --version
1.4.0

# in my env GOMAXPROC is 8

# when GOMAXPROC > number of input files
$ buf lint
app2/v1/ext2.proto:8:25:field app2.v1.bar2: duplicate extension: app1.v1.bar and app2.v1.bar2 are both using tag 126

# similar to the above 
# but GOMAXPROC is hardcoded to 1
$ GOMAXPROCS=1 buf lint
app2/v1/ext2.proto:8:25:field app2.v1.bar2: duplicate extension: app1.v1.bar and app2.v1.bar2 are both using tag 126

# when the process has a chance to split the input into different goroutines
# when 1 < GOMAXPROC <= number of input files
$ GOMAXPROCS=3 buf lint

# ^ no error detected

Reproducibility of the bug depends on the logic in getChunkSize() in private/bufpkg/bufimage/bufimagebuild/builder.go as well as how the chunks are split. So I used GOMAXPROCS equal to the number of input files/directories, because that makes the bug always reproducible. When GOMAXPROCS is lower than number of input files (typical scenario with decent sized codebases), the occurrence of the bug depends on order of the files on the filesystem.

mvitaly avatar May 26 '22 05:05 mvitaly

This is going to take a bit to solve, we're likely going to have to split linking into a separate step. Thanks for the reproduction.

bufdev avatar May 26 '22 15:05 bufdev

I believe this issue is fixed now in 1.9 due to the protocompile upgrade in #1463, am I correct?

I checked the demonstration just to see:

$ buf --version
1.9.0

$ GOMAXPROC=3 buf lint
app1/v1/ext1.proto:8:24:extension with tag 126 for message common.v1.Foo already defined at app2/v1/ext2.proto:8:25

cc @jhump

jchadwick-buf avatar Oct 31 '22 18:10 jchadwick-buf

@jchadwick-buf, yes, I can confirm that this is fixed in 1.9.0.

The issue has to do with how parallelism was achieved in earlier versions of buf: the list of all files in the module would be broken up into "chunks". The number of chunks was based on the level of parallelism (e.g. GOMAXPROC). Each chunk would be compiled independently, all concurrently. That means that symbol and extension number conflicts between two files that were processed in separate chunks would not be observed or reported. Setting GOMAXPROC to 1 runs all files in a single operation, which is why the collisions are reliably reported in that configuration.

In v1.4.0, an improvement was made that attempted to solve this heuristically: all files in the same folder would always be included in the same "chunk". If the repo is laid out per best practices, where each folder is a separate proto package, then this alleviates the symbol collision issue. But it still had an issue with identifying extension number collisions.

In the new compiler (introduced in v1.9.0), the files are not broken into chunks. Instead, the new compiler natively supports concurrency and handles the entire set of files in a single operation. That way, it reliably reports all collisions even when using a high level of concurrency internally.

jhump avatar Oct 31 '22 18:10 jhump