protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

protoc-gen-go: consider applying M option to prefixes

Open bufdev opened this issue 5 years ago • 14 comments

I was testing out the v2 plugin and got this:

2019/11/26 11:17:11 WARNING: Missing 'go_package' option in "bufbuild/buf/image/v1beta1/image.proto", please specify:
	option go_package = "bufbuild/buf/image/v1beta1;bufbuild_buf_image_v1beta1";
A future release of protoc-gen-go will require this be specified.

Can this be discussed? I would strongly argue against requiring long-form go_package values in the Protobuf schema definition. This ties a Protobuf schema to a specific generated code location, effectively, which is undesirable in many situations. For example, a lot of existing customers generate code on the fly in individual repositories with different base Golang modules, and this will effectively outlaw that (not that doing so is a good decision, but it's a widely used pattern).

This seems very restrictive, and will many a bunch of existing build tools around Protobuf not work. Personally, I've always strongly recommended against using long-form go_package values because of the multitude of ways that users consume Protobuf definitions.

bufdev avatar Nov 26 '19 16:11 bufdev

When one .proto file imports another, the code generator needs to know the Go package associated with each file.

For example, consider:

// a.proto
syntax = "proto3";
import "b.proto";
message A {
  B field_b = 1;
}
// b.proto
syntax = "proto3";
message B {}

When we generate a.pb.go, we need to know if the types A and B are in the same Go package or not. If they are in the same package, a.pb.go will contain:

type A {
  FieldB B
}

But if they are not in the same package, a.pb.go will contain:

import "example.com/package/b"
type A {
  FieldB b.B
}

But how do we tell what the Go package associated with a .proto file is?

The protobuf language has a concept of a package (e.g., package google.protobuf;), but protobuf packages don't map directly to Go packages. In particular, the protobuf language permits cycles between packages (but not .proto files), while the Go language forbids cycles. In addition, we have no way to derive the Go import path from the protobuf package.

Relying on the filename doesn't work, because there's no guarantee that the filesystem layout of the .proto files matches that of the Go packages. For example, the file google/protobuf/any.proto corresponds to the Go package "github.com/golang/protobuf/ptypes/any".

Currently, protoc-gen-go has a bunch of heuristics that it attempts to apply, but they're complicated, difficult to describe, and still get things wrong: https://go.googlesource.com/protobuf/+/refs/heads/master/compiler/protogen/protogen.go#212

The best way to tell the code generator the Go package associated with a .proto file is to just tell it.

neild avatar Nov 26 '19 16:11 neild

Yes for sure - I know it's a difficult problem. I was actually on the original issue that added go_package haha https://github.com/golang/protobuf/issues/139 and back in the day I supported it.

However, I think while the original intention was good, it's had some really bad unintended consequences that effectively have led to many (including myself) not using long-term go_package values and recommending against them entirely.

The Protobuf schema shouldn't be used to dictate this so strongly, there should be flexibility in generation. However, I recognize that I'm saying this is a problem without proposing a solution. I think we should come up with something better in the long term.

Just as a simple idea, one option that I think covers many use cases is a variation on the M option (also could be thought of as a variation on the import_prefix option), of the form Mdir=prefix, where all .proto files inside DIR are prefixed with `PREFIX. For example:

protoc -I proto --go_out=gen/go --go_opt Mproto=github.com/foo/bar/gen/go $(find proto -name '*.proto')`

This doesn't handle the github.com/golang/protobuf/ptypes/any case, but this is the exception, not the rule, and generally packages should match the directory structure of the Protobuf files as a rule. For those who need to break glass, go_package is still available.

bufdev avatar Nov 26 '19 16:11 bufdev

When it comes to deriving the Go package paths for generated packages, there's primarily two ways to sensibly do it: 1) encode it in the .proto file (which is what the go_package option is for), or 2) encode it in the command-line when calling protoc (which is what the -M flag is for). Both approaches have advantages and disadvantages.

The intention of the warning is to push the world towards either of those two. Inside Google, we primarily use the -M approach since we have a mono-repo. IIRC, the warning should not be present if the -M flag is present. If it is, I don't think that was my intention.

Also, the -M flag could be improved, but I think that's a separate discussion.

dsnet avatar Nov 26 '19 17:11 dsnet

I'll double check if the warning is present with with M flag or not - I have not tested that. Yes, I primarily use the M flag as well.

Note that it would be worth having that discussion - the M flag is a pain to set up on an automated basis, and for large, single protoc invocations, it can actually hit an issue where the maximum length of a protoc argument list is 30,000 characters (I think, it's something around 30,000, doing this from memory). So for 1000 files with long paths, it can hit that limit (I've hit it before). The Mdir=prefix option would get around that.

bufdev avatar Nov 26 '19 17:11 bufdev

I'd want to think about it, but Mdir=prefix sounds reasonable.

I definitely feel the pain of trying to use the M flag manually. It's really only practical if you've got a build tool like bazel handling it for you.

neild avatar Nov 26 '19 17:11 neild

That would be great. I'm happy to lend a hand if you're open to PRs.

Another related note, but probably worthy of a new issue: I would love to be able to generate assets produced byprotoc-gen-go-grpc to a different Golang package than the assets produced by protoc-gen-go, so that the grpc-go dependency is not enforced for packages that do not use gRPC. This might already be done/this might already have been one of the driving factors behind splitting up the plugins, I haven't tested it yet, but if, for example, the protoc-gen-go-grpc plugin took a package path that it is being generated to (or if it could auto-derive this optimally), and then if the Golang package of the .proto files that contain the request/response types is different than this package, it did the import...this would be great.

Having this ability to specify the location of the core Golang types from protoc-gen-go as part of compiler/protogen would be great for arbitrary Golang protoc plugins as well, i.e. protoc-gen-validate, protoc-gen-twirp, protoc-gen-grpc-gateway, etc...I have a lot of hopes that compiler/protogen can be the base for a lot of plugins in the future (which I think is the intention).

bufdev avatar Nov 26 '19 17:11 bufdev

A couple thoughts:

  1. What happens if you specify a prefix of ""? It seems like someone might want to do this to specify that files in the root directory are part of a certain Go package, but this would result in the prefix being applied to every .proto file including well-known ones like google/protobuf/timestamp.proto.

    Perhaps we could disallow the root prefix, but the general problem of a prefix possibly applying to more files than intended seems like an issue.

  2. The problem of per-file M options overrunning the maximum argument list size seems like it might be better solved by providing the import path mapping in a file. Using prefixes to work around argument length limits doesn't scale (what if you have too many prefixes?). The Objective C generator offers precedent for this.

neild avatar Mar 23 '20 22:03 neild

Just to revitalize this discussion: where did we land on this?

Requiring go_package on the same major release, even if documented, is going to break tens of thousands of developers using golang/protobuf today, I don't think that can be done to the golang/protobuf community IMO

bufdev avatar May 08 '20 19:05 bufdev

A couple thoughts:

1. What happens if you specify a prefix of ""? It seems like someone might want to do this to specify that files in the root directory are part of a certain Go package, but this would result in the prefix being applied to every `.proto` file including well-known ones like `google/protobuf/timestamp.proto`.
   Perhaps we could disallow the root prefix, but the general problem of a prefix possibly applying to more files than intended seems like an issue.

I think I'm missing something on prefix of "", do you mean M""=prefix or Mdir=""? I think you are referring to the former (correct me if I'm wrong). Otherwise, would dir=prefix affect google/protobuf/* files?

I feel like this only comes into play if you had google/protobuf vendored right? ie I had protoc -I proto --go_out=Mproto=github.com/foo/bar/gen/go:gen/go and then i.e. proto/google/protobuf/timestamp.proto existed. I think in this case, it would be on the user to have a separate vendor/proto or third_party/proto directory for such files, and then protoc -I vendor/proto -I proto.

2. The problem of per-file `M` options overrunning the maximum argument list size seems like it might be better solved by providing the import path mapping in a file. Using prefixes to work around argument length limits doesn't scale (what if you have too many prefixes?). The Objective C generator offers [precedent](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/compiler/objectivec/objectivec_generator.cc#L108) for this. 

Might be nice, I still think dir=prefix solves the general case though - I've had to program around this in a former career, heh.

bufdev avatar May 08 '20 20:05 bufdev

I mean M""=prefix.

The proto compiler doesn't provide local filesystem paths to protoc-gen-go. Let's say you run

protoc -Imyprotos --go_opt=M=prefix --go_out=. myprotos/foo.proto

If myprotos/foo.proto imports google/protobuf/timestamp.proto, we (protoc-gen-go) can't tell whether that came from myprotos/google/protobuf/timestamp.proto, /usr/share/include/google/protobuf/timestamp.proto, or some other location.

neild avatar May 08 '20 20:05 neild

Yea I wasn't thinking clearly, definitely true re: local fs paths (and the bane of my existence).

Some other options:

  • Straight up exclude the WKTs for dir=prefix. This might be interesting if combined with protoc-gen-go also automatically implicitly setting i.e. Mgoogle/protobuf/timestamp.proto=google.golang.org/protobuf/types/known/timestamppb (it might be doing this already?). Then, if a user really wants to set a golang path for this (for example, say gogo/protobuf is updated to v2), if they explicitly provide Mgoogle/protobuf/timestamp.proto=github.com/my/fork/timestamppb, then it uses that.
  • Mimport_path=prefix instead, ie if you have -I proto, and proto/foo directory, using foo instead of proto/foo. This is probably better as a new option though, so there's no confusion.

bufdev avatar May 08 '20 20:05 bufdev

I whipped up https://go-review.googlesource.com/c/protobuf/+/240238 when sketching out a fix for #1158 a few months ago. @dsnet pointed me over here to this issue. It might not be the full solution being discussed here, but I'm interested if people think it's headed in the right direction, or if we need extensive rework to make sense of things.

dcow avatar Sep 09 '20 03:09 dcow

An expansion of environment variables would solve the problem:

option go_package = "$PKG/foo/bar";
protoc -e PKG=somedir ...

This mechanism could be used for providing values for other options as well.

By the way, for the time being I solve this problem using an old-school technique:

rm -rf $PROTO_BUILD/*
cp -r $PROTO_ROOT/* $PROTO_BUILD/
find $PROTO_BUILD -name '*.proto' -exec sed -r -i "s@option go_package = \"(.*)\";@option go_package = \"$GITHUB_PKG/\1\";@g" {} \;

Hope that might help someone.

AgentCoop avatar Feb 05 '21 09:02 AgentCoop

Is there a solution now that is something like the one @bufdev proposed Mdir=prefix?

Jrenk avatar May 11 '22 17:05 Jrenk