protobuf
protobuf copied to clipboard
protoc-gen-go: consider applying M option to prefixes
I was testing out the v2 plugin and got this:
2019/11/26 11:17:11 WARNING: Missing 'go_package' option in "bufbuild/buf/image/v1beta1/image.proto", please specify:
option go_package = "bufbuild/buf/image/v1beta1;bufbuild_buf_image_v1beta1";
A future release of protoc-gen-go will require this be specified.
Can this be discussed? I would strongly argue against requiring long-form go_package
values in the Protobuf schema definition. This ties a Protobuf schema to a specific generated code location, effectively, which is undesirable in many situations. For example, a lot of existing customers generate code on the fly in individual repositories with different base Golang modules, and this will effectively outlaw that (not that doing so is a good decision, but it's a widely used pattern).
This seems very restrictive, and will many a bunch of existing build tools around Protobuf not work. Personally, I've always strongly recommended against using long-form go_package
values because of the multitude of ways that users consume Protobuf definitions.
When one .proto
file imports another, the code generator needs to know the Go package associated with each file.
For example, consider:
// a.proto
syntax = "proto3";
import "b.proto";
message A {
B field_b = 1;
}
// b.proto
syntax = "proto3";
message B {}
When we generate a.pb.go
, we need to know if the types A
and B
are in the same Go package or not. If they are in the same package, a.pb.go
will contain:
type A {
FieldB B
}
But if they are not in the same package, a.pb.go
will contain:
import "example.com/package/b"
type A {
FieldB b.B
}
But how do we tell what the Go package associated with a .proto
file is?
The protobuf language has a concept of a package (e.g., package google.protobuf;
), but protobuf packages don't map directly to Go packages. In particular, the protobuf language permits cycles between packages (but not .proto
files), while the Go language forbids cycles. In addition, we have no way to derive the Go import path from the protobuf package.
Relying on the filename doesn't work, because there's no guarantee that the filesystem layout of the .proto
files matches that of the Go packages. For example, the file google/protobuf/any.proto
corresponds to the Go package "github.com/golang/protobuf/ptypes/any".
Currently, protoc-gen-go
has a bunch of heuristics that it attempts to apply, but they're complicated, difficult to describe, and still get things wrong:
https://go.googlesource.com/protobuf/+/refs/heads/master/compiler/protogen/protogen.go#212
The best way to tell the code generator the Go package associated with a .proto
file is to just tell it.
Yes for sure - I know it's a difficult problem. I was actually on the original issue that added go_package
haha https://github.com/golang/protobuf/issues/139 and back in the day I supported it.
However, I think while the original intention was good, it's had some really bad unintended consequences that effectively have led to many (including myself) not using long-term go_package
values and recommending against them entirely.
The Protobuf schema shouldn't be used to dictate this so strongly, there should be flexibility in generation. However, I recognize that I'm saying this is a problem without proposing a solution. I think we should come up with something better in the long term.
Just as a simple idea, one option that I think covers many use cases is a variation on the M
option (also could be thought of as a variation on the import_prefix
option), of the form Mdir=prefix
, where all .proto
files inside DIR
are prefixed with `PREFIX. For example:
protoc -I proto --go_out=gen/go --go_opt Mproto=github.com/foo/bar/gen/go $(find proto -name '*.proto')`
This doesn't handle the github.com/golang/protobuf/ptypes/any
case, but this is the exception, not the rule, and generally packages should match the directory structure of the Protobuf files as a rule. For those who need to break glass, go_package
is still available.
When it comes to deriving the Go package paths for generated packages, there's primarily two ways to sensibly do it: 1) encode it in the .proto
file (which is what the go_package
option is for), or 2) encode it in the command-line when calling protoc
(which is what the -M
flag is for). Both approaches have advantages and disadvantages.
The intention of the warning is to push the world towards either of those two. Inside Google, we primarily use the -M
approach since we have a mono-repo. IIRC, the warning should not be present if the -M
flag is present. If it is, I don't think that was my intention.
Also, the -M
flag could be improved, but I think that's a separate discussion.
I'll double check if the warning is present with with M
flag or not - I have not tested that. Yes, I primarily use the M
flag as well.
Note that it would be worth having that discussion - the M
flag is a pain to set up on an automated basis, and for large, single protoc
invocations, it can actually hit an issue where the maximum length of a protoc
argument list is 30,000 characters (I think, it's something around 30,000, doing this from memory). So for 1000 files with long paths, it can hit that limit (I've hit it before). The Mdir=prefix
option would get around that.
I'd want to think about it, but Mdir=prefix
sounds reasonable.
I definitely feel the pain of trying to use the M
flag manually. It's really only practical if you've got a build tool like bazel handling it for you.
That would be great. I'm happy to lend a hand if you're open to PRs.
Another related note, but probably worthy of a new issue: I would love to be able to generate assets produced byprotoc-gen-go-grpc
to a different Golang package than the assets produced by protoc-gen-go
, so that the grpc-go dependency is not enforced for packages that do not use gRPC. This might already be done/this might already have been one of the driving factors behind splitting up the plugins, I haven't tested it yet, but if, for example, the protoc-gen-go-grpc
plugin took a package path that it is being generated to (or if it could auto-derive this optimally), and then if the Golang package of the .proto
files that contain the request/response types is different than this package, it did the import...this would be great.
Having this ability to specify the location of the core Golang types from protoc-gen-go
as part of compiler/protogen
would be great for arbitrary Golang protoc plugins as well, i.e. protoc-gen-validate, protoc-gen-twirp, protoc-gen-grpc-gateway
, etc...I have a lot of hopes that compiler/protogen
can be the base for a lot of plugins in the future (which I think is the intention).
A couple thoughts:
-
What happens if you specify a prefix of ""? It seems like someone might want to do this to specify that files in the root directory are part of a certain Go package, but this would result in the prefix being applied to every
.proto
file including well-known ones likegoogle/protobuf/timestamp.proto
.Perhaps we could disallow the root prefix, but the general problem of a prefix possibly applying to more files than intended seems like an issue.
-
The problem of per-file
M
options overrunning the maximum argument list size seems like it might be better solved by providing the import path mapping in a file. Using prefixes to work around argument length limits doesn't scale (what if you have too many prefixes?). The Objective C generator offers precedent for this.
Just to revitalize this discussion: where did we land on this?
Requiring go_package
on the same major release, even if documented, is going to break tens of thousands of developers using golang/protobuf today, I don't think that can be done to the golang/protobuf community IMO
A couple thoughts:
1. What happens if you specify a prefix of ""? It seems like someone might want to do this to specify that files in the root directory are part of a certain Go package, but this would result in the prefix being applied to every `.proto` file including well-known ones like `google/protobuf/timestamp.proto`. Perhaps we could disallow the root prefix, but the general problem of a prefix possibly applying to more files than intended seems like an issue.
I think I'm missing something on prefix of "", do you mean M""=prefix
or Mdir=""
? I think you are referring to the former (correct me if I'm wrong). Otherwise, would dir=prefix
affect google/protobuf/*
files?
I feel like this only comes into play if you had google/protobuf
vendored right? ie I had protoc -I proto --go_out=Mproto=github.com/foo/bar/gen/go:gen/go
and then i.e. proto/google/protobuf/timestamp.proto
existed. I think in this case, it would be on the user to have a separate vendor/proto
or third_party/proto
directory for such files, and then protoc -I vendor/proto -I proto
.
2. The problem of per-file `M` options overrunning the maximum argument list size seems like it might be better solved by providing the import path mapping in a file. Using prefixes to work around argument length limits doesn't scale (what if you have too many prefixes?). The Objective C generator offers [precedent](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/compiler/objectivec/objectivec_generator.cc#L108) for this.
Might be nice, I still think dir=prefix
solves the general case though - I've had to program around this in a former career, heh.
I mean M""=prefix
.
The proto compiler doesn't provide local filesystem paths to protoc-gen-go
. Let's say you run
protoc -Imyprotos --go_opt=M=prefix --go_out=. myprotos/foo.proto
If myprotos/foo.proto
imports google/protobuf/timestamp.proto
, we (protoc-gen-go
) can't tell whether that came from myprotos/google/protobuf/timestamp.proto
, /usr/share/include/google/protobuf/timestamp.proto
, or some other location.
Yea I wasn't thinking clearly, definitely true re: local fs paths (and the bane of my existence).
Some other options:
- Straight up exclude the WKTs for dir=prefix. This might be interesting if combined with
protoc-gen-go
also automatically implicitly setting i.e.Mgoogle/protobuf/timestamp.proto=google.golang.org/protobuf/types/known/timestamppb
(it might be doing this already?). Then, if a user really wants to set a golang path for this (for example, say gogo/protobuf is updated to v2), if they explicitly provideMgoogle/protobuf/timestamp.proto=github.com/my/fork/timestamppb
, then it uses that. -
Mimport_path=prefix
instead, ie if you have-I proto
, andproto/foo
directory, usingfoo
instead ofproto/foo
. This is probably better as a new option though, so there's no confusion.
I whipped up https://go-review.googlesource.com/c/protobuf/+/240238 when sketching out a fix for #1158 a few months ago. @dsnet pointed me over here to this issue. It might not be the full solution being discussed here, but I'm interested if people think it's headed in the right direction, or if we need extensive rework to make sense of things.
An expansion of environment variables would solve the problem:
option go_package = "$PKG/foo/bar";
protoc -e PKG=somedir ...
This mechanism could be used for providing values for other options as well.
By the way, for the time being I solve this problem using an old-school technique:
rm -rf $PROTO_BUILD/*
cp -r $PROTO_ROOT/* $PROTO_BUILD/
find $PROTO_BUILD -name '*.proto' -exec sed -r -i "s@option go_package = \"(.*)\";@option go_package = \"$GITHUB_PKG/\1\";@g" {} \;
Hope that might help someone.
Is there a solution now that is something like the one @bufdev proposed Mdir=prefix
?