protobuf-es icon indicating copy to clipboard operation
protobuf-es copied to clipboard

Provide support for top-level exports

Open smaye81 opened this issue 1 year ago • 7 comments

This is to open discussion about the implementation of top-level exports when generating JS/TS code via the plugin framework.

Problem

Specifically, what we're exploring is that given the following:

package foo;

message Foo {
  string foo_field = 1;
}
package bar;

message Bar {
  int32 bar_field = 1;
}

We would provide the option of generating:

.
├── index.js
├── package.json
└── proto
    ├── foo
    │   ├── foo_pb.d.ts
    │   └── foo_pb.js
    └── bar
        ├── bar_pb.d.ts
        └── bar_pb.js

where index.js looks something like:

export { Foo } from "./proto/foo/foo_pb.js";
export { Bar } from "./proto/bar/bar_pb.js";

This functionality would ostensibly be opt-in through a plugin option of something like generateIndex or something similar.

Benefits

This would provide a few benefits (note that below benefits would also require associated changes to / creation of a package.json file.

  • When using remote packages, it would allow users to write the following import:

    import { Foo } from "@buf/bufbuild_foo.bufbuild_es";
    

    instead of

    import { Foo } from "@buf/bufbuild_foo.bufbuild_es/proto/foo/foo_pb.js"
    
  • It could help facilitate auto-imports in IDEs like VSCode.

  • It would make it a lot easier for users to use tools like Intellisense to see exports.

Difficulties

There are difficulties associated with this however. The biggest being how we handle name clashes with exported symbols.

Since Protobuf has namespaces, it is perfectly acceptable to have two messages (or services or enums) with the same name in different namespaces. For example, consider:

package foo.v1;

message Foo {
  string foo_field = 1;
}
package foo.v2;

message Foo {
  int32 bar_field = 1;
}

Now the generation of the index.js file becomes much more complicated because this wouldn't work:

export { Foo } from "./proto/foo/v1/foo_pb.js";
export { Foo } from "./proto/foo/v2/foo_pb.js";

So we would have to somehow alias one of these exports to something like:

export { Foo } from "./proto/foo/v1/foo_pb.js";
export { Foo as v2_Foo } from "./proto/foo/v2/foo_pb.js";

Even this solution is not straightforward and is further complicated by the following:

  • The order is mostly predictable (at least with Buf) that files are stable-sorted and passed to the plugin in the same order. However, it is not guaranteed across all codebases and compilers and is not part of the plugin contract. This means that, while probably rare, the potential is there to alias a completely different export between compilation runs.
  • Even with Buf's predictability, this is only effective using the strategy option during generation. This means if we provide this option it would have to be with the caveat to always make sure strategy is set to all, which is not a great developer experience.
  • Users would potentially have no way of knowing after generation that symbols have been aliased and further, which ones were aliased. So, simply importing Foo from their package may not give them the intended Foo.
  • If this option is allowed, then it would probably also have to be allowed for other plugins, such as Connect-ES so that service definitions would also be exported. This means that both plugins will generate an index.js file and preventing the conflicts / collisions that would ensue will not be easy.
  • Alias conventions could become very ugly. Ideally, we would have to use the fully-qualified path to properly disambiguate across packages, but this could snowball for deeply nested packages. Consider package proto.orders.clients.foo.bar.v1;. This could cause the alias to be something like proto_orders_client_foo_bar_v1_FooClient.
  • As mentioned above, these changes also require an accompanying package.json to declare types and main/exports.

Potential Solutions

Note that all of these solutions only apply when the option to generate top-level exports is true.

  1. Alias all the things all the time, using the fully-qualified package name as the alias.

  2. Only alias when a conflict occurs and further, only alias the conflict using the smallest possible package path (see example above with v2_Foo.

  3. Throw an error at generation time if a conflict is detected, alerting the user which types conflict. (this is probably not a great idea, but listed here for visibility).

  4. Only generate top-level exports within a package. This would eliminate the conflict issue altogether as exported symbols within a package are guaranteed not to clash. However, as one might have guessed, this also has its problems:

    • This would need subpath exports defined in package.json, not just a plain exports field since we would end up with multiple entrypoints per path. We would also most likely have to specify types fields as well in this subpath export map.
    • Subpath patterns may or may not be possible here, we would have to test different versions of TypeScript, Node.js, and popular bundlers to get some confidence that it works well enough.
    • This will only work for ESM generated code, not for CJS code that the protocolbuffers/js and grpc/web plugins generate.
    • Since npm artifacts are expected to be stable (i.e. not change without a version bump), we have to introduce this feature in remote packages by bumping the version number in the registry URL (https://buf.build/gen/npm/v1/). If we discover an incompatibility later, we have to bump the registry URL version again for a fix or rollback.
    • All this really provides is that users would no longer need the filename in the import path, which seems like a small ROI. So
      import { Foo } from "@buf/bufbuild_foo.bufbuild_es/proto/foo"
      
      instead of
      import { Foo } from "@buf/bufbuild_foo.bufbuild_es/proto/foo/foo_pb.js"
      

smaye81 avatar Apr 11 '23 20:04 smaye81

I've just been exploring this kind of issue, it doesn't look like it's received much attention.

We currently have a package generation process for both protobuf.js and the protoc-gen versions that, post generation, walks the directory tree and generates index files that look similar to:

.
├── index.js
├── package.json
└── proto
    ├── foo
    │   ├── foo_pb.d.ts
    │   ├── foo_pb.js
    │   └── index.js
    └── bar
        ├── bar_pb.d.ts
        ├── bar_pb.js
        └── index.js

the 'leaf' index.js look like

module.exports = {
  bar_pb: require("./bar_pb"),
};

and the folder index.js (for each level of package namespace)

module.exports = {
  foo: require("./foo"),
  bar: require("./bar"),
};

(or some mix inbetween, plus their corresponding d.ts files).

This results in being able to do

import { foo, bar } from 'package';
something(foo.foo_pb.Foo);
something(bar.bar_pb.Bar);

The addition of the 'filename+_pb' here always kind of bugged me (I thought it would be nice to be able to use foo.Foo here) and I started exploring using the bufbuild/protogen package, but the trick is that buf itself (and maybe protoc) executes the generator at a per-packagename/directory level, so backing into the common 'upper level' packages to create the indexes isn't real possible except by some post-processor. Perhaps theres something in between to be used.

In any case, so far, a post-process script that walks the structure seems to be the answer right now. For what its worth, I haven't found any other language generator that does produce the packaging artifacts or provide 'meta' like this, either.

onyxraven avatar Dec 08 '23 17:12 onyxraven

Hi,

Without going into the detail and depth of the original proposal, and only referring to the issue we are having, my vote is to:

1. Alias all the things all the time, using the fully-qualified package name as the alias.

I'll give a simple example where we were forced to change message names, in order to avoid import clashes.

package valueobjects;

// Message representing postal address value object
message PostalAddress {
  string street = 1;
  string city = 2;
  ...
}
package resources;

import "valueobjects.proto";

// Message representing postal address entity
message PostalAddress {
  string id= 1;
  valueobjects.PostalAddress postal_address = 2;
  ...
}

when complied we get

import { PostalAddress } from "../../../../valueobjects_pb.js";

/**
 * Postal address message.
 *
 * @generated from message resources.v1.PostalAddress
 */
export const PostalAddress = proto3.makeMessageType(
  "resources.v1.PostalAddress",
  () => [
    { no: 1, name: "id", kind: "scalar", T: 9 /* ScalarType.STRING */ },
    { no: 2, name: "postal_address", kind: "message", T: PostalAddress },
  ],
);

which fails with error

Identifier 'PostalAddress' has already been declared

The above messages compile just fine in go .

Having to change message names to avoid import/export name clashes for JS feels penalising.

Thank you for opening this conversation.

SpareShade avatar Jan 21 '24 08:01 SpareShade

The PostalAddress name clash error seems like an unrelated bug that should have its own GH issue and should be fixed. The import should be aliased in the compiled code.

jcready avatar Jan 21 '24 11:01 jcready

@SpareShade, echoing what jcready said: This sounds like a bug, regardless of top-level exports. Can you open an issue?

timostamm avatar Jan 22 '24 10:01 timostamm

@SpareShade, the collision is fixed by #688 in v1.7.2.

timostamm avatar Feb 01 '24 12:02 timostamm

Thanks for the fix @timostamm !

Back to the original ask,

I have the start of a protoc plugin (using the bufbuild toolkit) that can generate the indexes with exports. I'm working through corporate approval to move it to our opensource org, and will follow up here. It does require adding strategy: all to the plugin in buf. Currently it only creates the .ts files (the autogenerate of js/dts was not working, so I still need to build the functions/templates to create those).

onyxraven avatar Feb 01 '24 18:02 onyxraven

thank you @timostamm & @jcready ... my apologies for being slow/polluting the thread

SpareShade avatar Feb 03 '24 01:02 SpareShade