Support grpcweb trailers encoded in the message
As the Fetch API does not expose HTTP trailers to the Javascript runtime, grpcweb mandates that tailers are included in the message payload with the most-significant bit of the leading byte (flags) set to 1 followed by a run-length-encoded block of text that follows the same formatting as normal headers.
Most extant grpcweb libraries written in JS/TS are lenient about this and will happily forego receiving trailers. However, some are more picky about this and REQUIRE trailers (the buf.read Connect libraries are an example of this).
With this changset:
GRPC.Server follows the spec when sending protos over grpcweb, allowing errors and other custom trailers to be sent in a way that is visible to the client.
GRPC.Message recognizes trailers and parses them appropriately: it extracts partial-buffer messages using the run-length encoding bytes (which it was previously quietly ignoring, which would also allow malformed buffers due to e.g. network problems sneak through anyways), it respects the trailers flag, and returns appropriate data in each of these cases.
The GRPC client now also works with embedded trailers.
Overhead for non-grpcweb should be nominal as new code paths are hidden behind grpcweb checks, while the additional binary checks are placed in front the error paths (so errors may be nominally slower to be reached, but the happy paths should be untouched).
This has been tested with both Google's own grpc-web lbirary as well as buf.build's connect/connect-web libraries with a real-world API being served by elixir-grpc's grpc libraries.
This does need more testing (what doesn't!), and there are some decisions made in the details of the code that could be discussed.
Hi @aseigo, thank you for this PR.
I need to be very careful here because I'm currently working on these APIs to incorporate a new adapter for the server, and therefore I need to evaluate the least effort merge method to proceed. Give me some time to analyze this.
I think it's worthwhile to run the benchmark against this PR as well and measure how much it adds to the hot path. Perhaps an optional feature, as Paulo suggested, would be interesting to ensure optimal performance in all cases, but I haven't evaluated this in depth, just a suggestion.
That said, I will take a closer look, as well as the other PRs in the queue this week, and I will get back to you soon with my opinions.
Thank you again, and we'll talk soon.
I need to be very careful here because I'm currently working on these APIs to incorporate a new adapter for the server, and
Oh, nice! Out of curiosity: Will it replace the current cowboy-based adapter, or is it using another webstack altogether (e.g. bandit)?
I did find the current adapter modules a bit of a maze as they call each other back and forth, so knowing there's some work happening there is really nice to hear!
I need to be very careful here because I'm currently working on these APIs to incorporate a new adapter for the server, and
Oh, nice! Out of curiosity: Will it replace the current
cowboy-based adapter, or is it using another webstack altogether (e.g.bandit)?I did find the current adapter modules a bit of a maze as they call each other back and forth, so knowing there's some work happening there is really nice to hear!
I explain more here https://github.com/elixir-grpc/grpc/pull/482
I did find the current adapter modules a bit of a maze as they call each other back and forth, so knowing there's some work happening there is really nice to hear!
I don't know if I'm doing a much better job hahaha... let me know your opinion. 😄
Benchmarks incoming!
With this PR:
Total requests: 100000
Total time: 24.54 seconds
Requests per second: 4075.05
Average latency: 0.245 ms
Total requests: 100000
Total time: 24.876 seconds
Requests per second: 4019.95
Average latency: 0.249 ms
Total requests: 100000
Total time: 24.461 seconds
Requests per second: 4088.14
Average latency: 0.245 ms
From upstream master branch:
Total requests: 100000
Total time: 24.748 seconds
Requests per second: 4040.76
Average latency: 0.247 ms
Total requests: 100000
Total time: 24.49 seconds
Requests per second: 4083.22
Average latency: 0.245 ms
Total requests: 100000
Total time: 24.708 seconds
Requests per second: 4047.28
Average latency: 0.247 ms
This is quite repeatable, with the times between different runs being within c.a. ~1% of each other.
Benchmarks incoming!
With this PR:
Total requests: 100000 Total time: 24.54 seconds Requests per second: 4075.05 Average latency: 0.245 ms Total requests: 100000 Total time: 24.876 seconds Requests per second: 4019.95 Average latency: 0.249 ms Total requests: 100000 Total time: 24.461 seconds Requests per second: 4088.14 Average latency: 0.245 msFrom upstream
masterbranch:Total requests: 100000 Total time: 24.748 seconds Requests per second: 4040.76 Average latency: 0.247 ms Total requests: 100000 Total time: 24.49 seconds Requests per second: 4083.22 Average latency: 0.245 ms Total requests: 100000 Total time: 24.708 seconds Requests per second: 4047.28 Average latency: 0.247 msThis is quite repeatable, with the times between different runs being within c.a. ~1% of each other.
Great! So the decision is whether we want to accept "incorrect" payloads regardless when grpcweb-formatted data is sent outside that scope. I'm ok with just keeping the single code path.
I explain more here https://github.com/elixir-grpc/grpc/pull/482
Wow, that is a crazy amount of work, but it's clearly paying off! I haven't tested the PR (yet!) but have skimmed the code (and read more interesting parts with a bit more care), and so far it looks really nice. A bit unfortunate to have to implement an http2 stack, but I can see how it makes sense in this case, given how this is an absolutely core part of this framework of libraries.
I'm a big fan of thousand_island, always impressed by the performance of it given it is straight Elixir code. <3
In any case, I can see how merging in the (frankly annoying) grpcweb trailers support and your work can be a bit of a chore. Happy to see this go in in whatever order makes sense to you. IMHO the new adapter has clear priority given it stands to provide a significant performance improvement, and would be the foundation for "features" needed by e.g. grpcweb
Also, I feel like grpc_server could use a new test or test change too.
A small related comment:
The existing tests for GRPCWeb exercise these code paths and do catch when they fail. I can add a few more tests for variations (preferably once we've decided on the final shape of things so as to test actual code that may be merged :) ), but I was actually able to use the existing tests to drive this towards a working state.
In fact, once the tests were passing, it all Just Worked(tm), first time of trying, with the buf..build Connect libraries.
Kudos to everyone who's worked on them as they made my effort here a lot easier!
Great! So the decision is whether we want to accept "incorrect" payloads regardless when grpcweb-formatted data is sent outside that scope. I'm ok with just keeping the single code path.
That's a good question.
It would probably mean having to add another param to GRPC.Message.get_message/1 (e.g. the codec), and in practice this should only appear when the source is a server in grpc_web mode which means it would be another check that doesn't do anything in practice ... so personally I don't think even more API changes is really worth it?
Great! So the decision is whether we want to accept "incorrect" payloads regardless when grpcweb-formatted data is sent outside that scope. I'm ok with just keeping the single code path.
That's a good question.
It would probably mean having to add another param to
GRPC.Message.get_message/1(e.g. the codec), and in practice this should only appear when the source is a server in grpc_web mode which means it would be another check that doesn't do anything in practice ... so personally I don't think even more API changes is really worth it?
Yeah, I'm leaning towards that as well. @sleipnir do you agree?
Great! So the decision is whether we want to accept "incorrect" payloads regardless when grpcweb-formatted data is sent outside that scope. I'm ok with just keeping the single code path.
That's a good question. It would probably mean having to add another param to
GRPC.Message.get_message/1(e.g. the codec), and in practice this should only appear when the source is a server in grpc_web mode which means it would be another check that doesn't do anything in practice ... so personally I don't think even more API changes is really worth it?Yeah, I'm leaning towards that as well. @sleipnir do you agree?
As long as there are tests that validate the behavior of a grpc server receiving grpc-web requests and not failing, that is, as long as the behaviors are validated, then it's fine to proceed with a single approach.
@aseigo Thanks for the PR, excellent work.
I'm just still checking which PR will go into the main branch first.
Working on another fix for GRPC errors (general problem, not related to webrpc), I refactored this a small amount, pulling the grpcweb trailer handling into its own function. Same code, no functional changes, just a bit cleaner and easier to work around. Hopefully it will also make it a bit clearer what would need to be done in the new adapter as well?
Cheers!
Does you have a very rough timeframe for when this is likely to be merged? i need to make some architectural decisions for a product using the buf grpc library and need to know whether i'll need to get envoy onto developer environments.
Appreciate that hard work you guys put in presumably for free 🙏, please don't take this as a "hurry up" message
Does you have a very rough timeframe for when this is likely to be merged? i need to make some architectural decisions for a product using the buf grpc library and need to know whether i'll need to get envoy onto developer environments.
I don't know when it will be merged exactly (that's someone else's decision), but I have some work to do on this tomorrow (a pair of tests are not passing), and then it should hopefully be ready.
If the pressing need is for development envs, you can always include my fork with this branch as a dependency in the short term until this does get merged. That is what I am currently doing, in fact (also working with the buf.build connect libs)
Does you have a very rough timeframe for when this is likely to be merged? i need to make some architectural decisions for a product using the buf grpc library and need to know whether i'll need to get envoy onto developer environments.
Appreciate that hard work you guys put in presumably for free 🙏, please don't take this as a "hurry up" message
Once the tests pass, We will merge it.
@aseigo, let me know if I can help with anything else. I'll handle merging this later in the PR for the new adapter.
Does you have a very rough timeframe for when this is likely to be merged? i need to make some architectural decisions for a product using the buf grpc library and need to know whether i'll need to get envoy onto developer environments.
I don't know when it will be merged exactly (that's someone else's decision), but I have some work to do on this tomorrow (a pair of tests are not passing), and then it should hopefully be ready.
If the pressing need is for development envs, you can always include my fork with this branch as a dependency in the short term until this does get merged. That is what I am currently doing, in fact (also working with the buf.build connect libs)
If the PR is itself ready, you can always use a git dependency in Mix: {:grpc, github: "aseigo", branch: "feature/grpcweb-trailers-in-message"} or :ref instead of :branch if you want to point to a specific commit hash.
This would work while we don't merge, but rest assured that this will be merged eventually. We just have a few other things external to this PR going on that we need to sort out before merging this.
let me know if I can help with anything else
Thanks, I appreciate it! I think it's all good, though ... I just need to work out those last two tests, which I will do tomorrow. :)
If the PR is itself ready, you can always use a git dependency in Mix: {:grpc, github: "aseigo", branch: "feature/grpcweb-trailers-in-message"} or :ref instead of :branch if you want to point to a specific commit hash.
Yep, that's what I've been doing, was just wondering about the package because the reset of my team is very new to elixir and I don't trust them to understand how git deps work
Strangely, this gives me a bunch of dependency errors when I compile deps, everything seems to work just fine if I manually add the package subdirs. Almost certainly a layer 8 issue on my end and if things merge soon I won't put any thought into it.
Thanks for the quick replies :)
Ok, all tests are passing, and the errors are appearing both with native and grpcweb clients for me, including the improvements @sleipnir pushed the other day to make the {:error, ...} tuple return style work.
This is ready for a final review! :)
Ok, all tests are passing, and the errors are appearing both with native and grpcweb clients for me, including the improvements @sleipnir pushed the other day to make the
{:error, ...}tuple return style work.This is ready for a final review! :)
Thank you @aseigo @polvalente and I will discuss the best way to proceed, thank you again for your contribution.