amqp icon indicating copy to clipboard operation
amqp copied to clipboard

Implementation of Context based method for Channel

Open AlexisMontagne opened this issue 6 years ago • 24 comments

Related to #266

Implement a second set of RPC method on the Channel having a context.Context as first argument

AlexisMontagne avatar Feb 22 '19 18:02 AlexisMontagne

This is great. To bikeshed the names a bit, I think the existing names like QosContext etc make sense because they follow the standard library adoption of context aware IO like Dial and DialContext.

With the async nature of client or server channel closure, cancelling the protocol is prone to subtle bugs. Please add some test coverage for the context cancellation cases.

streadway avatar Feb 25 '19 09:02 streadway

@michaelklishin @streadway thanks for the feedback. I wanted to make sure the implementation would be ok with you.

I will add some tests and documentation for the new public methods.

To avoid out-of-sequence received messages, i cancel the current channel if a rpc call is cancelled. Do you think it is the right thing to do?

AlexisMontagne avatar Feb 25 '19 14:02 AlexisMontagne

OK, if this follows an existing naming convention then we should use those names.

@AlexisMontagne I'm not sure I understand the last question of yours, sorry.

michaelklishin avatar Feb 25 '19 16:02 michaelklishin

@michaelklishin sorry, i submitted my comment by mistake.

I meant: In order to deal with canceled RPC calls. Should we close the channel? or keep it open? cf. https://github.com/streadway/amqp/pull/389/files#diff-861e77702f49a71d32be4b3c7600818fR185

AlexisMontagne avatar Feb 25 '19 17:02 AlexisMontagne

I meant: In order to deal with canceled RPC calls. Should we close the channel? or keep it open?

My first reaction is to keep the channel open. AMQP Channel lifecycle is complicated and shares client and server resources. If the context methods only relate to RPC round trips instead of server resources, it'd would be easier to document.

streadway avatar Feb 25 '19 17:02 streadway

When looking at error values, can you suggest a way of promoting a client side context.Context.Err() into an *amqp.Error?

streadway avatar Feb 25 '19 17:02 streadway

@streadway you can find a standardized *amqp.Error here: https://github.com/streadway/amqp/pull/389/commits/95dafc3f19f4cdbda1fe2ba4b756dc00808258dd

I meant: In order to deal with canceled RPC calls. Should we close the channel? or keep it open?

My first reaction is to keep the channel open. AMQP Channel lifecycle is complicated and shares client and server resources. If the context methods only relate to RPC round trips instead of server resources, it'd would be easier to document.

If we keep the channel open, i'm scared the message could cause issues

Let's say the following case: T1: Channel X sends frame for RPC call Y T2: The context for the RPC call Y is canceled T3: Channel X sends frame for RPC call Z T4: The dispatcher forward to the rpc channel the response for the RPC call Y

Is there any way to correlate request and response frame (sequence ID?)? In order to discard frame received for canceled calls

AlexisMontagne avatar Feb 25 '19 19:02 AlexisMontagne

@AlexisMontagne other clients address this by not allowing channel sharing between threads/routines/etc. There are no sequence identifiers in the protocol, only response order guarantees for synchronous methods.

Note that sharing a channel for concurrent publishing, for example, is inherently unsafe because a published messages is 2 or 3 frames on the wire and you can end up with incorrect frame interleaving.

While closing the channel might be a decent option since there is no way to undo a send frame, the above makes me wonder if

  • Context support in this case would really achieve the goals it might in other libraries
  • Concurrent access to a channel should be advertised as supported (it may be in some cases for consumers but as with any mutable state, eliminating sharing avoids a lot of hard to dig out issues)

Keeping a channel open and documenting known limitations would be closer to what virtually every other RabbitMQ client does.

WDYT?

michaelklishin avatar Feb 25 '19 22:02 michaelklishin

Note that sharing a channel for concurrent publishing, for example, is inherently unsafe because a published messages is 2 or 3 frames on the wire and you can end up with incorrect frame interleaving.

This library is concurrent safe for all messages, including publishing, by holding a channel specific mutex until all publish frames are sent before sending others.

streadway avatar Feb 26 '19 08:02 streadway

  • Context support in this case would really achieve the goals it might in other libraries

The primary use case for the context package is to represent a distributed, cancelable transaction.

It sounds like this is a discussion of "does a cancelled context propagate to the server or not?"

If we were to close a channel on cancellation during any RPC, would we also close the connection should we not be able to open a channel (RPC on channel id 0)?

Let's start with discarding the in-flight reply after cancellation to keep the channel open, and let the application determine how to propagate a cancellation to the server based on what the application considers a transaction.

streadway avatar Feb 26 '19 08:02 streadway

Then it's up to the client to filter out the responses it cannot/no longer needs to handle. Java client and Bunny do this by yes, ignoring the responses they can detect as stale/mismatching the pending continuation (pending operation).

The only scenario where one cannot open a channel is if you run into the channel_max limit (2047 in modern RabbitMQ versions by default) and that is pretty rare and usually indicates a channel leak. Closing a connection in that case would still be pretty disruptive and unexpected :/

michaelklishin avatar Feb 26 '19 09:02 michaelklishin

@streadway i updated the pull request according to your last message. And added a test to ensure the messages received after the call is canceled are dropped.

AlexisMontagne avatar Feb 26 '19 17:02 AlexisMontagne

@streadway what do you think of https://github.com/streadway/amqp/pull/389#discussion_r261221451?

AlexisMontagne avatar Mar 04 '19 16:03 AlexisMontagne

@streadway i think i got what you mean. Tell me what you think of that last commit.

Are we really sure the server will always reply to the client? Will it never give up on replying to any message?

All my reasoning was based on the fact the server could be flaky and skip replies.

AlexisMontagne avatar Mar 04 '19 19:03 AlexisMontagne

AMQP requires TCP which offers transport reliability. The heartbeater will close the connection on read timeout, which will return an error should we experience any message loss.

I've found RabbitMQ very reliable and not drop or show flakey behavior, so we can count on every protocol request receiving a reply.

A counter is one approach, but that assumes that contexts will be cancelled in order which I don't think we would be able to depend on. Consider the message exchange:

req(a)
req(b)
req(c)
req(d)
cancel(a) 0->1
cancel(c) 1->2
reply(a) - dropped 2->1
reply(b) - dropped 1->0
reply(c) - dispatch(b)
reply(d) - dispatch(d)

streadway avatar Mar 05 '19 08:03 streadway

@streadway for some reason i was thinking a was mutex surrounding every call() calls 🤔 .

I updated my PR to handle the use case you described (and added a test to ensure it).

AlexisMontagne avatar Mar 05 '19 18:03 AlexisMontagne

@streadway it should be good now. I wrapped the "sending" part of the call method in a mutex.

It also makes the frame sending thread safe. Previously https://github.com/streadway/amqp/blob/master/channel.go#L219 was not limited by any Channel wide mutex.

@michaelklishin The added tests should now pass with the -race options

AlexisMontagne avatar Mar 11 '19 19:03 AlexisMontagne

@AlexisMontagne thank you for fixing the problematic test.

michaelklishin avatar Mar 12 '19 00:03 michaelklishin

@michaelklishin i did not write any fo this documentation. I used the one already in the code for the context-free methods

AlexisMontagne avatar Mar 12 '19 13:03 AlexisMontagne

@AlexisMontagne OK, thanks. I will revisit that together with the new functions then.

michaelklishin avatar Mar 12 '19 15:03 michaelklishin

@streadway any opinion on this final version?

michaelklishin avatar Mar 15 '19 00:03 michaelklishin

I admire the amount of effort you've put into the docs and perhaps 50 to 65% of the docs can be retained. Unfortunately the docs are not always correct and reinvent RabbitMQ doc guides where it's not really necessary. I'd recommend linking to the relevant RabbitMQ guides more. If something is missing in the guides (e.g. a guide on bindings besides the tutorials) I'd be happy to work on that. This will benefit all RabbitMQ users regardless of the client.

@michaelklishin This would be great! The method docs were written in 2012 and tried to strike the balance between sharing enough context about AMQP principles, RabbitMQ extensions/constraints and Client library choices. I'd very much welcome documentation review and improvements from the RabbitMQ team!

streadway avatar Mar 15 '19 10:03 streadway

@streadway and I discussed this on a Hangout. Besides splitting this PR into a few (@AlexisMontagne thank you for your truly exemplary patience with us!), we'd like to revisit the internal concurrency model used by this client to both simplify Context adoption and make it closer to other clients. It has a chance of improving CPU resource utilisation on some workloads, too ;)

@streadway is to file a new issue about that but basically operation dispatch should be synchronised per channel and not over the entire connection. Channels are assumed to be "independent entities" in the protocol. Publishing on a shared [protocol] channel is not supported so other clients guarantee total ordering on a channel, not across channels. A single channel in the server never dispatches its operations concurrently either. This would make Context support more straightforward.

michaelklishin avatar Mar 15 '19 14:03 michaelklishin

Hey folks,

I'm posting this on behalf of the core team.

As you have noticed, this client hasn't seen a lot of activity recently. Many users are unhappy about that and we fully recognize that it's a popular library that should be maintained more actively. There are also many community members who have contributed pull requests and haven't been merged for various reasons.

Because this client has a long tradition of "no breaking public API changes", certain reasonable changes will likely never be accepted. This is frustrating to those who have put in their time and effort into trying to improve this library.

We would like to thank @streadway for developing this client and maintaining it for a decade — that's a remarkable contribution to the RabbitMQ ecosystem. We this now is a good time to get more contributors involved.

Team RabbitMQ has adopted a "hard fork" of this client in order to give the community a place to evolve the API. Several RabbitMQ core team members will participate but we think it very much should be a community-driven effort.

What do we mean by "hard fork" and what does it mean for you? The entire history of the project is retained in the new repository but it is not a GitHub fork by design. The license remains the same 2-clause BSD. The contribution process won't change much (except that we hope to review and accept PRs reasonably quickly).

What does change is that this new fork will accept reasonable breaking API changes according to Semantic Versioning (or at least our understanding of it). At the moment the API is identical to that of streadway/amqp but the package name is different. We will begin reviewing PRs and merging them if they make sense in the upcoming weeks.

If your PR hasn't been accepted or reviewed, you are welcome to re-submit it for rabbitmq/amqp091-go. RabbitMQ core team members will evaluate the PRs currently open for streadway/amqp as time allows, and pull those that don't have any conflicts. We cannot promise that every PR would be accepted but at least we are open to changing the API going forward.

Note that it is a high season for holidays in some parts of the world, so we may be slower to respond in the next few weeks but otherwise, we are eager to review as many currently open PRs as practically possible soon.

Thank you for using RabbitMQ and contributing to this client. On behalf of the RabbitMQ core team, @chunyilyu and @michaelklishin.

michaelklishin avatar Jun 09 '21 12:06 michaelklishin