rust
rust copied to clipboard
External protos
As mentioned in #272 I believe this will simplify the current release/build process for these bindings and using an external crate for this means some of the fields currently serialized as byte vectors can be interacted with by users with less hassle or internal change. As well as greater ability to extend the bindings without reliance on changes in this crate.
Also, I've generated a whole bunch of protos you don't even need yet so it's easy peasy to just add another feature and get more stuff bought into this crate whereas personally I found before interacting with tensorflow-proto-codegen was difficult to get working.
I'm aware relying on an external crate for the proto implementations may be seen as somewhat of a risk so I'm willing to add you as a maintainer/contributor @adamcrume :smile:
All the failures seem to be of the form:
/tmp/rustdoctestC1cNkE/rust_out: error while loading shared libraries: libtensorflow.so.1: cannot open shared object file: No such file or directory
Is there an issue with CI?
The doc tests have been broken for a while because of https://github.com/rust-lang/cargo/issues/8531; that's what the shared library errors in Travis are.
What are the benefits of putting the protos in a separate crate, instead of keeping them in the tensorflow crate and just making them public API? I do have some concerns around coordinating releases.
I'll have to think about it more when I've had more sleep.
I would prefer the proto types became public and the deprecated fields were all removed for a minor version bump :eyes: it would be nice if there was a minimum number of versions a deprecated field lived for before deletion.
Well a lot of the proto types aren't as yet publicly exposed the only fields you can access them from are a Vec<u8> or buffer. Without exposing them it means users need to generate the same proto code themselves (which is faff), or start grabbing private internals of these bindings and adding them into their own code. Which is something I've seen people have to do to get certain things works.
My hope with this is that there's less pressure to update the bindings to get certain functionality, and users can easily get the Vec<u8> into compatible types, and if needs be go to using things in the tensorflow-sys crate with the proto crate for functionality not yet in the tensorflow crate. Some of the gaps between releases have been semi-painful to work around at my own job where we're using these bindings so minimising that pain without needing to demand you work on this more. I suppose another option could be finding more trusted contributors?
Also, I got here because I was attempting to add a proto in another PR I was working on and working out all the protos I needed to add to the existing build script for dependencies was a massive pain, as well as working out to run the codegen itself (the docs didn't quite help, I eventually worked it out from something like a CI script). My method generates code for all the protos, works out the dependencies and sets up feature gates for every proto file you'd want to add including it's dependencies. So then for any new protos in the future it's just a case of enabling a feature which is definitely easier :eyes:
Another option could be to move my code inline to this repo, and then give you publish access to the protobufs crate I published. And then everything is here
I agree that this project should move toward using actual protobuf types and not just Vec<u8>. I know that manually working with protobuf codegen is a headache for most users who want to use protos. I just have concerns about the details and how we get from here to there.
I don't think we want to get rid of the methods which return Vec<u8>, because some users may already be using other protobuf libraries. In fact, this is the main reason why the project hasn't committed to using a particular protobuf library in its API, yet: there hasn't been a clear winner in terms of which protobuf crate to use. protobuf is pretty popular, and it's what we're using internally and seems to work well, but some users may be using prost or something else.
One question I had was whether it made sense for the generated protos to go into the tensorflow crate or a separate one. One benefit in separating them is that code which needs TensorFlow protos but not TensorFlow itself (e.g. something generating Example protos or reading output) could use the proto crate without the tensorflow-sys crate, so that could be a reasonable argument for keeping them separate.
I'd be a bit more comfortable if we feature-gate the use of proto types in the API for now (i.e. a single feature named something like protobuf_protos), at least for a few releases. I'd also be more comfortable if the code were in this Git repo (even if in a separate crate), and if I had publish access on the proto crate. My concern is that if something happens and you're not able to release the proto crate, then the tensorflow crate wouldn't be able to use new protos/fields, and switching to using generated proto code in a different crate would be a breaking change.
By the way, if there is any documentation which is unclear, please file an issue or submit a PR. We should fix bad or incomplete docs.
I'm also definitely interested in having more contributors to the project, especially for the Windows build. I always need help with the Windows build. :(
I don't think we want to get rid of the methods which return Vec
, because some users may already be using other protobuf libraries. In fact, this is the main reason why the project hasn't committed to using a particular protobuf library in its API, yet: there hasn't been a clear winner in terms of which protobuf crate to use. protobuf is pretty popular, and it's what we're using internally and seems to work well, but some users may be using prost or something else.
Hmm I can adapt my crate to generate using different protobuf crates and feature gate each one. I use prost in other projects just because tonic uses it so would be nice to cut down dependencies. I can probably get it to work so you can just do it like features=["prost"]/features=["protobuf"] and then if neither are supplied the Vec<u8> is supplied to the user 🤔
My concern is that if something happens and you're not able to release the proto crate, then the tensorflow crate wouldn't be able to use new protos/fields, and switching to using generated proto code in a different crate would be a breaking change.
That's why I suggested making you a contributor to the repo/crates.io as well. That would give you push/merge/publish rights 😄. With that in mind would you still want it inside this repo, it might end up being more substantially sized with other protobuf options included.
Either way, I'll start working on changes to my proto crate to prototype other protobuf generation crates 👍
https://github.com/xd009642/tensorflow-protos-rs/tree/prost_support if you want to have a look at prototype prost support here. However, prost generates the files differently, instead of a file for each proto it's a file for each include directory. This means I can't generate the feature crates and a lot more will be pulled in as a result.
I did notice there was a tensorflow_grpc proto file so I guess for a fuller support I should considering generating the grpc service definitions as well if also adding prost support...