amqp icon indicating copy to clipboard operation
amqp copied to clipboard

Sender and Receiver ampq:decode-error ...

Open psparago opened this issue 4 years ago • 4 comments

I have two components that use Amazon MQ as their common AMQP provider. Both components are in separate Linux CentOS EC2 instances (however, I was able to reproduce this issue once using Apache ActiveMQ and all components on a my local CentOS VM, i.e. no Amazon MQ).

One component is written in .NET Core using the amqpnetlite client library. The other component is written in Go using vcabbage/amqp (latest). The interaction between these components is done via AMQP queues. Each component listens to their own receiver queue for messages sent by the other component.

Both components work as expected, but after some seemingly random amount of time (for example 11 minutes in my latest test), the Go component will begin logging both receive and send errors. Once this has happened in the Go component, the Go component must be restarted. This error has not happened at all on the .net core component.

The errors I'm seeing look like this (for example on the sending side):

error sending message to:client-base-0, error: *Error{Condition: amqp:decode-error, Description: Could not decode AMQP frame: hex: 0000016a02000000005314d000000013000000045201522fa008000000000000000043005373d00000003e0000000ca12466303961313362612d333031662d343531612d383831642d6364636265393331616163394040a1076f6e652d77617940a100404040404043005375a0fc7b226f223a302c226d223a22636f6e74657874222c2274223a22636f6e74726f6c222c2270223a7b22706d223a2231222c227063223a2231222c227275223a2231222c227365727665725f76657273696f6e223a2239392e302e302e31353733303635303234227d2c227263223a22626173652d30222c22736964223a2272632d736572766572403137322e33312e33362e313736222c2269223a2266303961313362612d333031662d343531612d383831642d636463626539333161616339222c226f736964223a22222c22636964223a22222c227274223a226f6e652d776179222c22726b223a22222c226f77223a747275652c2276223a327d, Info: map[]}

The hex digits digits appear to be identical in every error.

This issue is causing an impediment to a very high priority project, so I would appreciate any assistance. I'm happy to post code etc. if that will help.

psparago avatar Nov 15 '19 21:11 psparago

The error message is being produced by ActiveMQ. There's a good chance ActiveMQ is specifying it while closing the connection. Once that happens pretty much any action on the connections, sessions, and links will return the same error (perhaps the logic should be adjusted so it's clear the error is originating from a broker initiated close).

You should be able to mitigate the impact by recreating the connection from scratch when an error occurs. This is a good idea in general since there's no re-connection logic built in and something like a network interruption would also cause problems.

The hex in the error message appears to be a valid transfer frame.

Header: {Size:362 DataOffset:2 FrameType:0 Channel:0}
Body: Transfer{Handle: 1, DeliveryID: 47, DeliveryTag: "\x00\x00\x00\x00\x00\x00\x00\x00", MessageFormat: 0, Settled: false, More: false, ReceiverSettleMode: <nil>, State: <nil>, Resume: false, Aborted: false, Batchable: false, Payload [size]: 327}

Since the error isn't very specific this may be difficult to track down. I think ActiveMQ prints a stack trace when errors like this happen, that's likely the best bet for determining what about the frame it's having an issue with.

If you'd like to share the Go code I can take a look to see if there's any issue there, but I'm not optimistic it'll lead to a resolution.

vcabbage avatar Nov 15 '19 23:11 vcabbage

Thank you very, very much for the speedy reply. It is very much appreciated!

I have some experience with AMQP 0.9 and ActiveMQ JMS, but I am brand new to AMQP 1.0 so if you wouldn't mind taking a look at my code, I'd be very grateful. There's really not much to it since I'm really using AMQP as just a shared memory provider in an HA environment.

I've attached sanitized code so it is not buildable due to confidentiality requirements. Also I have not made the changes suggested in the response above.

Once again, I am very grateful for your help. Thank you!

amqp1sample.txt

psparago avatar Nov 16 '19 13:11 psparago

You're welcome.

A couple potentially relevant notes:

  • It seems there is a potential for data races on the client, session, and receiver fields. The first thing I would suggest is building and running with the race detector enabled (-race flag). If you haven't used it before, be aware that it will significantly increase CPU and memory utilization and slow down the program.
  • While there are some AMQP errors that are scoped to a link (sender/receiver) or session, many are fatal to the connection and will require a reconnect. The most conservative approach is to reconnect on any AMQP error. (You already mentioned that you haven't made any changes previously suggested, just wanted to add a little more context.)

vcabbage avatar Nov 16 '19 15:11 vcabbage

Once again, thank you very much for your time. I will implement your suggestions.

psparago avatar Nov 16 '19 18:11 psparago