protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

ruby codegen: support generation of rbs files

Open HoneyryderChuck opened this issue 1 year ago • 46 comments

this introduces support for a new protoc option, --rbs_out, which points to a directory where ruby type definition files, defined in the RBS format, are stored.

rbs is the type signature syntax blessed by the ruby core team, used by static analysis tools such as steep, which integrates with VS Code, the irb console (for features such as autocompletion). and typeprof.

It relies on type definitions written into .rbs files.

The protobuf library already exposes type definitions in gem_rbs_collection, which is used to source type definitions for libraries which do not want, or can't maintain type definitions themselves.

(protobuf could arguably import these into this repository, lmk if you're interested).

This should fix gaps such as better IDE integration, such as the ones described here.

The plan is to do roughly the same type of integration as was done for .pyi annotations, which also write to separate files: add the cli option and the rbs generator.

protobuf classes in ruby rely on dynamic attribution of the base class, which makes what I'm trying to achieve a bit difficult. Ideally the type hierarchy could be specified statically in the ruby source code.

class Bar < AbstractMessage
class Bar <
Google::Protobuf::DescriptorPool.generated_pool.lookup("Bar").msgclass

HoneyryderChuck avatar Jan 29 '24 17:01 HoneyryderChuck

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Jan 29 '24 17:01 google-cla[bot]

@acozzette can I ask for your input on directions here, given your thoughts described in https://github.com/protocolbuffers/protobuf/issues/9495 ?

HoneyryderChuck avatar Jan 29 '24 17:01 HoneyryderChuck

@HoneyryderChuck This looks like a great direction to me, but I will let @haberman review this since he is much more knowledgeable about Ruby than I am.

acozzette avatar Feb 06 '24 21:02 acozzette

@haberman any feedback? waiting for comments about the direction, before coming back to this.

HoneyryderChuck avatar Mar 27 '24 12:03 HoneyryderChuck

Sorry for the delay. I think this is a good direction, and I appreciate the small and incremental PR. I think we should move forward with this.

It looks like the CLA check is failing.

haberman avatar May 06 '24 22:05 haberman

thx for the feedback @haberman . Did you have a look at https://github.com/protocolbuffers/protobuf/pull/15881 , which may be already the endgame?

HoneyryderChuck avatar May 09 '24 08:05 HoneyryderChuck

https://github.com/protocolbuffers/protobuf/pull/15881 has a much greater scope, and I think we'd need to individually evaluate its many parts.

I'm convinced that we want a .rbs code generator that has static types for all of the generated APIs. I'm less certain about:

  • Adding .rbs files for our public core APIs
  • Adding .rbs files for our internal core APIs
  • Adding .rbs files for our unit tests
  • Adding Steep type checking to our build

haberman avatar May 09 '24 20:05 haberman

makes sense 👍 will deal with CLA and move on from there.

HoneyryderChuck avatar May 10 '24 09:05 HoneyryderChuck

@haberman went with cherry-picking @qnighy 's commit, and removing the parts not related with sole generation of rbs files on target protos. There was a lot done there that I'd struggle to replicate with the same quality, as I'm still fighting with the build system to generate binaries I can test with locally.

FWIW RBS definitions to protobuf itself are already available here; while it's possible to have the definitions within this repo itself as per the other MR, it's a choice of the maintainers.

HoneyryderChuck avatar May 10 '24 15:05 HoneyryderChuck

Honestly, from the experience working on the PR, I agree that it is probably too early to introduce a typechecker to an existing large project like google-protobuf, which naturally have idioms incompatible with the current type system.

That said, I think it is better to bundle the public type definition here.

The reason: firstly, just similarly to other ecosystems like TypeScript, type definition files are not mere derivative of the runtime code; writing type definitions inherently involves additional design decision. For example, whether to give a type an alias or not, and its name, depends on the author of the type definitions.

Secondly, in google-protobuf, the runtime and the generated code are expected to work in cooperation, and this would also be true in type definitions as well. Therefore, design decision made in the runtime type definition naturally depends on the decisions in the generator.

So, to consistently maintain type definitions, I think it should bundle the one for the runtime as well.

qnighy avatar May 13 '24 11:05 qnighy

Those arguments make sense to me, but we also have to balance the benefits against the maintenance cost. The primary motivation for introducing type signatures for generated code is to make generated interfaces more discoverable and to help IDEs, because the Ruby generated code is currently very opaque, especially since we merged https://github.com/protocolbuffers/protobuf/pull/12462.

The public APIs from the core runtime will also have this issue to some extent, since many interfaces are implemented in C and are probably not discoverable to IDEs. We have Rubydoc embedded in comments, like this example, but that won't help IDEs: https://github.com/protocolbuffers/protobuf/blob/f2d8c2bdd86f34b953c7bae9075daa7648025d3c/ruby/ext/google/protobuf_c/defs.c#L146-L152

But once we're talking about internal-only APIs (like everything in ffi/) and especially unit tests, I think we start to get diminishing returns. The cost is having to write everything twice (once in .rb and once in .rbs), using an unfamiliar language/tool that nobody on the team currently knows, and we don't get any benefits unless we perform type checking in our own project. I think the maintenance burden would outweigh the benefit.

Steep describes itself as "Gradual Typing", which I assume means that type signatures are not required to be complete. Anything that has type signatures is checked, but everything else can continue to work even if it's not typed.

I think the sweet spot would be if we could generate .rbs for generated code, and possibly for the public interfaces in the core runtime, and add just enough type checking in our CI tests to ensure that those type signatures are correct with respect to the actual code. But we wouldn't bother with typing the internal-only interfaces or unit tests.

How does that sound?

haberman avatar May 13 '24 13:05 haberman

@haberman that sounds reasonable, but I don't think we need to do this as part of this PR. FWIW anyone forcibly generating rbs defs for their protobufs will very likely know about rbs-collection already, and get typedefs for core runtime.

There'd still be value in adding typedefs to core runtime; I could see steep integration in CI via the added rbs-based suite catching basic bugs, and could be gradually adopted by the core team (or not).

HoneyryderChuck avatar May 13 '24 14:05 HoneyryderChuck

I totally agree. If anything, I'm arguing to limit the scope of each individual PR and make things as incremental as possible.

haberman avatar May 14 '24 19:05 haberman

@haberman thx for the feedback. Would you mind adding the safe for tests label, so we can continue moving forward?

HoneyryderChuck avatar May 14 '24 21:05 HoneyryderChuck

I ran the tests, looks like the build failed.

haberman avatar May 15 '24 17:05 haberman

@haberman is it possible for the bot not to remove the "safe for tests" label?

HoneyryderChuck avatar May 17 '24 13:05 HoneyryderChuck

@haberman is it possible for the bot not to remove the "safe for tests" label?

Unfortunately not, the purpose of the tag is to visually verify each commit to ensure that our CI infrastructure isn't being exploited either for resources (eg. bitcoin mining) or to exfiltrate repository secrets.

haberman avatar May 29 '24 20:05 haberman

@haberman would you mind having a look? The CI failures seem unrelated to this change (windows bazel build failing on objective C / java, and some security check around workflow files).

HoneyryderChuck avatar Jun 03 '24 09:06 HoneyryderChuck

@HoneyryderChuck can you rebase and we'll try the tests again?

googleberg avatar Sep 14 '24 04:09 googleberg

@googleberg done 👍

HoneyryderChuck avatar Sep 17 '24 10:09 HoneyryderChuck

Could we add a disclaimer in the generated .rbs files?

# This RBS interface is provided for convenience, on a best-effort basis.
# The library is the definitive source for the API contract; if the RBS file
# and the library's behavior differ, the library behavior is authoritative.
# We welcome fixes to change the RBS file to match.

That will reduce the risk of accepting this change. We don't want people to take the RBS definition as definitive, and expect the library to change to match.

haberman avatar Oct 11 '24 21:10 haberman

done

HoneyryderChuck avatar Oct 14 '24 15:10 HoneyryderChuck

I'm not seeing this warning in the .rbs files or in the code generator. Am I missing something?

haberman avatar Oct 14 '24 16:10 haberman

@haberman the change was in the rbs_generator.cc file (which shows a "large diff, click to expand" message). I did forget to rerun the generator on the test files, fixed that already. Wondering whether the "generating the stubs" part shouldn't be part of the CI to avoid that.

HoneyryderChuck avatar Oct 15 '24 07:10 HoneyryderChuck