activemq icon indicating copy to clipboard operation
activemq copied to clipboard

[AMQ-8398] Fix conversion of 4-byte unicode characters between JMS and STOMP

Open azotcsit opened this issue 1 year ago â€ĸ 4 comments

Jira ticket: https://issues.apache.org/jira/browse/AMQ-8398

Overview

The problem is actual for the cases when STOMP producer and JMS consumer OR TCP producer and STOMP consumer are in use. STOMP and TCP protocols treat 4-byte unicode characters differently. STOPM uses native Java logic (e.g. https://github.com/apache/activemq/blob/main/activemq-stomp/src/main/java/org/apache/activemq/transport/stomp/StompFrame.java#L79) whereas TCP uses custom logic. This PR adjusts the logic and adds support for 4-byte messages.

Basically here is a code snippet that demonstrates that current custom conversion logic is incorrect:

String str = "!ÂŽāąŠ\uD83D\uDE42";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutput dataOut = new DataOutputStream(baos);
MarshallingSupport.writeUTF8(dataOut, str);
byte[] resultBytes = baos.toByteArray();
System.out.println(Arrays.toString(resultBytes));

Here is the actual output: [0, 0, 0, 12, 33, -62, -82, -32, -79, -87, -19, -96, -67, -19, -71, -126] and the correct output: [0, 0, 0, 10, 33, -62, -82, -32, -79, -87, -16, -97, -103, -126].

Local Testing

Publisher (Python STOMP)

import stomp
sync_conn = stomp.Connection(host_and_ports=[('localhost', 61613)], auto_content_length=False)
sync_conn.connect(username='admin', passcode='admin', wait=True)
sync_conn.send(body="🙃🙂", destination='test')
sync_conn.disconnect()

Consumer before fix

./bin/activemq consumer --destination queue://test --messageCount 1                      
INFO: Loading '/Users/alekseizotov/opt/apache-activemq-5.16.3//bin/env'
INFO: Using java '/Users/alekseizotov/opt/jdk/current/Contents/Home/bin/java'
Extensions classpath:
  [/Users/alekseizotov/opt/apache-activemq-5.16.3/lib,/Users/alekseizotov/opt/apache-activemq-5.16.3/lib/camel,/Users/alekseizotov/opt/apache-activemq-5.16.3/lib/optional,/Users/alekseizotov/opt/apache-activemq-5.16.3/lib/web,/Users/alekseizotov/opt/apache-activemq-5.16.3/lib/extra]
ACTIVEMQ_HOME: /Users/alekseizotov/opt/apache-activemq-5.16.3
ACTIVEMQ_BASE: /Users/alekseizotov/opt/apache-activemq-5.16.3
ACTIVEMQ_CONF: /Users/alekseizotov/opt/apache-activemq-5.16.3/conf
ACTIVEMQ_DATA: /Users/alekseizotov/opt/apache-activemq-5.16.3/data
 INFO | Connecting to URL: failover://tcp://localhost:61616 as user: null
 INFO | Consuming queue://test
 INFO | Sleeping between receives 0 ms
 INFO | Running 1 parallel threads
 INFO | Successfully connected to tcp://localhost:61616
 INFO | consumer-1 wait until 1 messages are consumed
javax.jms.JMSException: java.io.UTFDataFormatException
  at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:72)
  at org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:104)
  at org.apache.activemq.command.ActiveMQTextMessage.getText(ActiveMQTextMessage.java:84)
  at org.apache.activemq.util.ConsumerThread.run(ConsumerThread.java:64)
Caused by: java.io.UTFDataFormatException
  at org.apache.activemq.util.MarshallingSupport.convertUTF8WithBuf(MarshallingSupport.java:389)
  at org.apache.activemq.util.MarshallingSupport.readUTF8(MarshallingSupport.java:358)
  at org.apache.activemq.command.ActiveMQTextMessage.decodeContent(ActiveMQTextMessage.java:101)
  ... 2 more
 INFO | consumer-1 Consumed: 0 messages
 INFO | consumer-1 Consumer thread finished

Consumer after fix

./bin/activemq consumer --destination queue://test --messageCount 1
INFO: Loading '/Users/alekseizotov/opt/apache-activemq-5.16.3//bin/env'
INFO: Using java '/Users/alekseizotov/opt/jdk/current/Contents/Home/bin/java'
Extensions classpath:
  [/Users/alekseizotov/opt/apache-activemq-5.16.3/lib,/Users/alekseizotov/opt/apache-activemq-5.16.3/lib/camel,/Users/alekseizotov/opt/apache-activemq-5.16.3/lib/optional,/Users/alekseizotov/opt/apache-activemq-5.16.3/lib/web,/Users/alekseizotov/opt/apache-activemq-5.16.3/lib/extra]
ACTIVEMQ_HOME: /Users/alekseizotov/opt/apache-activemq-5.16.3
ACTIVEMQ_BASE: /Users/alekseizotov/opt/apache-activemq-5.16.3
ACTIVEMQ_CONF: /Users/alekseizotov/opt/apache-activemq-5.16.3/conf
ACTIVEMQ_DATA: /Users/alekseizotov/opt/apache-activemq-5.16.3/data
 INFO | Connecting to URL: failover://tcp://localhost:61616 as user: null
 INFO | Consuming queue://test
 INFO | Sleeping between receives 0 ms
 INFO | Running 1 parallel threads
 INFO | Successfully connected to tcp://localhost:61616
 INFO | consumer-1 wait until 1 messages are consumed
 INFO | consumer-1 Received 🙃🙂
 INFO | consumer-1 Consumed: 1 messages
 INFO | consumer-1 Consumer thread finished

azotcsit avatar Aug 20 '24 06:08 azotcsit

@tabish121 - Do you have any thoughts on this PR (or know who else would know?)

I'm not too familiar with the UTF-8 encoding stuff or the history of why we custom rolled the encoding. Obviously changing anything with the encoding utility is a high risk change so I want to make sure we don't break things (ie make sure we can still read old stuff stored in KahaDB as i think it uses the same utilities).

I'm open to either using the changes here or the JDK implementation if we can get that to work (like I quickly added a test in my branch https://github.com/cshannon/activemq/commit/e68021d33bbc1396d7e72083ee17fd00b620e374 , although it's not quite workign all the way as i noted). The main thing is making sure it's correct and we don't introduce a bug

cshannon avatar Aug 21 '24 16:08 cshannon

@tabish121 - Do you have any thoughts on this PR (or know who else would know?)

I'm not too familiar with the UTF-8 encoding stuff or the history of why we custom rolled the encoding. Obviously changing anything with the encoding utility is a high risk change so I want to make sure we don't break things (ie make sure we can still read old stuff stored in KahaDB as i think it uses the same utilities).

I think one reason that a custom versions was created is that there isn't any tooling in the JDK to get the encoded size without actually encoding it, which requires a buffer sized to the expected output encoding size so to allow the openwire encoder to not have to guess (possibly more than once) the code was created. We do something similar in the AMQP codec in proton-j for this same reason. I think the broker uses this API everywhere an openwire message has its string encoded or decoded so it should be safe but I guess you might need to audit that to be certain.

At a glance the changes seem okay, although it has been awhile since I did this sort of thing.

I'm open to either using the changes here or the JDK implementation if we can get that to work (like I quickly added a test in my branch cshannon@e68021d , although it's not quite workign all the way as i noted). The main thing is making sure it's correct and we don't introduce a bug

tabish121 avatar Aug 21 '24 16:08 tabish121

I don't think my branch trying to use the JDK will necessarily work based on what @tabish121 pointed out for encoding, and on the decoding side there seems to be issues with compatibility. (probably the same problem seen with the Stomp conversion)

@azotcsit - can you explain a bit about your changes and how they were in terms of compatibility with things encoded with the existing/old encoder? Specifically for 4-byte unicode because i'm sure it works fine with 3 byte. This will probably need some tests written to demonstrate the compatibility as well.

I tried testing writing a string with 4 byte unicode (the same one in the unit test in this PR) with the old/existing encoder and then decoding it using the JDK UTF-8 decoder in my branch and it breaks. What's interesting is things decode fine with the modified method in this PR to decode. What's even more interesting to me is that even though the existnig code is only supposed to handle 3 byte unicode, i tried testing writing/reading the 4 byte unicode string and it worked (which I guess there's a bug or some other special handling in the custom code)

So I'm curious what the big differences are between the JDK version and the modified version in this PR. It's critical to make sure things will be able communicate across versions if sending 4 byte unicode. For example, any old messages in a data store would need to be able to be decoded still after upgrade and another example could be old clients with the old encoding logic sending messages with 4 byte unicode to a broker that has been updated.

cshannon avatar Aug 21 '24 18:08 cshannon

Oh and something I should point out for anyone that looked at my branch, I noticed there's an unused allocated buffer that needs to be removed in DataByteArrayOutputStream that has been there since 2017

Looks like it was introduced as part of https://issues.apache.org/jira/browse/AMQ-6651

cshannon avatar Aug 21 '24 18:08 cshannon

Thanks to everyone for the comments! They were very helpful!

@cshannon

So something needs fixing but I was just curious if using the built in JDK support might be a better option for maintainability going forward.

Thanks for pointing this out! I checked https://issues.apache.org/jira/browse/AMQ-8122 ticket; I totally agree with the points mentioned in this ticket. So I re-wrote the existing test and split it to separate scenarios for 1,2,3,4 bytes characters.

I'm open to either using the changes here or the JDK implementation if we can get that to work (like I quickly added a test in my branch https://github.com/cshannon/activemq/commit/e68021d33bbc1396d7e72083ee17fd00b620e374 , although it's not quite workign all the way as i noted). The main thing is making sure it's correct and we don't introduce a bug

I totally support moving to standard Java utilities/implementation. The only reason I had not done that initially - I thought that we wanted to minimize amount of changes. But since community seems to be open to getting rid of the custom implementation, I'd be glad to migrate to the standard implementation.

@tabish121

We do something similar in the AMQP codec in proton-j for this same reason.

Could you please share a link to the corresponding codebase, so I can explore it. I'm not sure it is necessary, but I might find something that could be missed in this PR.

which requires a buffer sized to the expected output encoding size so to allow the openwire encoder to not have to guess (possibly more than once) the code was created.

I had not found a standard method for calculating the encoded size, so I kept the custom implementation for that. The rest, I moved to standard methods. It should be still compatible with all existing protocols.

@cshannon

I tried testing writing a string with 4 byte unicode (the same one in the unit test in this PR) with the old/existing encoder and then decoding it using the JDK UTF-8 decoder in my branch and it breaks. What's interesting is things decode fine with the modified method in this PR to decode.

It is expected for the JDK UTF-8 decoder to fail as the existing encoder does not support 4-bytes characters. But the modified logic in the first commit should also fail to read data written by the existing encoder. Could you please double check this? I believe it did not work for me.

What's even more interesting to me is that even though the existnig code is only supposed to handle 3 byte unicode, i tried testing writing/reading the 4 byte unicode string and it worked (which I guess there's a bug or some other special handling in the custom code)

Yes, this is tricky! If the old/existing logic is used for both writing and reading, then it works even for 4-byte characters. In fact, 3-bytes logic serializes 4-bytes data incorrectly, but since it reads data back (and expects incorrect bytes), it works.

So I'm curious what the big differences are between the JDK version and the modified version in this PR.

Ideally, there should be no difference. In fact, I changed custom logic to the standard JDK methods and everything seems to be working the same way.

It's critical to make sure things will be able communicate across versions if sending 4 byte unicode. For example, any old messages in a data store would need to be able to be decoded still after upgrade and another example could be old clients with the old encoding logic sending messages with 4 byte unicode to a broker that has been updated.

This is a very great point! I think the current changes are not backward-compatible. I'll give a few rounds of testing to backward-compatibility to make sure there is a problem. Most probably, it will require to develop some additional logic for that. Could you please point me to some code where there is some compatibility-related logic. I'd like to understand how this type of problems are handled currently.


Summary for the second commit:

  • replaced custom serialization/deserialization logic with standard JDK methods
  • improved DataByteArrayInputStreamTest.testNonAscii() as per https://issues.apache.org/jira/browse/AMQ-8122

Next steps:

  • test current logic for compatibility between JMS and STOMP protocols
  • test compatibility for old client connected to new server
  • test compatibility for new client connected to old server
  • test compatibility for persisted messages after upgrade
  • further development based on testing results and PR comments

azotcsit avatar Aug 26 '24 05:08 azotcsit

@azotcsit - Here is the link to the protonj code where they have a custom impl https://github.com/apache/qpid-proton-j/blob/main/proton-j/src/main/java/org/apache/qpid/proton/codec/WritableBuffer.java

I think the idea is you don't know the length using the built in JDK version so this gets the length to allocate the buffer size first. I believe protonj uses the JDK though for decoding even if using a custom impl for encoding.

@tabish121 - does that sound right?

cshannon avatar Aug 26 '24 22:08 cshannon

@azotcsit - Here is the link to the protonj code where they have a custom impl https://github.com/apache/qpid-proton-j/blob/main/proton-j/src/main/java/org/apache/qpid/proton/codec/WritableBuffer.java

I think the idea is you don't know the length using the built in JDK version so this gets the length to allocate the buffer size first. I believe protonj uses the JDK though for decoding even if using a custom impl for encoding.

@tabish121 - does that sound right?

Sounds right, the codec doesn't want to incur overhead of writing into buffers and later copying them into the resulting output buffer so it does the encode itself directly, but uses the JDK to decode the string by providing a slice that provides the limit to the JDK on which bytes comprise the encoded string.

tabish121 avatar Aug 26 '24 22:08 tabish121

I haven't had a change to look at all the changes yet but wanted to respond to a few comments:

Yes, this is tricky! If the old/existing logic is used for both writing and reading, then it works even for 4-byte characters. In fact, 3-bytes logic serializes 4-bytes data incorrectly, but since it reads data back (and expects incorrect bytes), it works.

This is pretty much what I was assuming, that the current code works because the decoding handles the incorrect format, otherwise I didn't see how it could be working.

This is a very great point! I think the current changes are not backward-compatible. I'll give a few rounds of testing to backward-compatibility to make sure there is a problem. Most probably, it will require to develop some additional logic for that. Could you please point me to some code where there is some compatibility-related logic. I'd like to understand how this type of problems are handled currently.

I'm not sure if we really have anything to work for compatibility for something like this because it's a bug and not designed. For OpenWire, the version is negotiated so old versions of clients/brokers can communicate by using the same version.

To handle it I thought of 3 quick ways but all seem pretty terrible but figured I'd list them anyways:

  1. We just fix it but only apply to a major version and say it's incompatible but this is not really practical because one of the core parts of OpenWire is old clients should be able to communicate. So I don't think this is a good idea.
  2. We provide a way to make the fix optional, maybe a flag to fall back to the old logic. This also seems like a bad idea though because we'd either have to make the default the old buggy way (and no one would change it)...or if the default is the new fixed way then there's a risk of breaking if someone sends data that needs 4 byte encoding which may or may not be rare depending on the use case.
  3. We fix it and then just try and handle errors automatically and fall back. For example, if on deserialization it blows up maybe we try using the old logic. I feel like this might work only sometimes because it requires an exception to be thrown but I assume it could be possible for things to fail in such a way where it doesn't error and just ends up with the wrong result which is not detectable and much worse. It also requires keeping around all the old logic which isn't great.

About the only thing I can think of that might work and be reliable would be to generate a new OpenWire version and for the new version the marshaller will just use the fixed UTF-8 encoding/decoding. This works for the datastore as well as KahaDB also tracks the openwrie version used to store. This method still requires keeping around the old broken logic as well but at least if we negotiated the new version we would know the client could support it. Although as I type this it makes me wonder about other non-java clients. I'm wondering if the C++ client or .NET clients (for example) are currently broken if they properly handle UTF-8 encoding already.

cshannon avatar Aug 26 '24 22:08 cshannon

@cshannon a couple of brainstorms from your ideas:

What about a blend of a few things?

  1. WireFormat flag to track the behavior change for older versions: legacyUTF8EncodingEnabled = true (default in 6.x )

  2. New OpenWire version bump that implements the new change by default

  3. In ActiveMQ 7.x, we either drop the old OpenWire protocol versions (or move them to openwire-legacy to allow users to optionally add them back in), and/or toggle the legacyUTF8EncodingEnabled to 'false' by default

An advantage of using a new OpenWire version is that the non-Java clients could add the specific test when adding support for the new rev.

mattrpav avatar Aug 26 '24 23:08 mattrpav

@mattrpav - I think something like that with OpenWire may work but still need to figure out the exact details.

I checked the CPP andNMS clients and they have the same logic/bug, so another advantage of tying the fix to a new OpenWire version is those clients will not break since we would keep the existing broken logic for old openwire.

https://github.com/apache/activemq-cpp/blob/d6f76ede90d21b7ee2f0b5d4648e440e66d63003/activemq-cpp/src/main/decaf/io/DataOutputStream.cpp#L272-L339

https://github.com/apache/activemq-nms-api/blob/93eacbe7a258518cd5f12642c62d28aed0d6d1bc/src/nms-api/Util/EndianBinaryWriter.cs#L200

Something else as I was looking at openwire that looks fishy and may suffer the same issue is this: https://github.com/apache/activemq/blob/e45ee4aae5b49986750d5a2feb171f36cf432fca/activemq-client/src/main/java/org/apache/activemq/openwire/v12/BaseDataStreamMarshaller.java#L308-L342

So we may need to fix the marshallers. There's also the issue that we need to use the new OpenWire project as the current generation doesn't work unless you use JDK 8 and 5.x version (which we could do and cherry pick). The new OpenWire project may not be quite ready yet and also has common marshalling code for all versions so that might be a problem if we fix the bug in BaseDataStreamMarshaller in that project for the old versions.

cshannon avatar Aug 27 '24 23:08 cshannon

@azotcsit and @mattrpav and @tabish121 -

So I'm starting to think, what if we are thinking about this backwards?

The entire ecosystem around ActiveMQ has been modified when encoding strings as UTF-8, including the C++ and NMS and Java clients and broker to use the same modified UTF-8 encoding. The encoding/decoding is not standard UTF-8 but is a modified version of course because it only uses 3 bytes and doesn't work correctly with 4 byte encoding the "standard" UTF-8 way.

But what if that is ok? Everything works just fine with the modified encoding as long as you use the same encoding logic and decoding logic and don't try and mix it like STOMP does (ie don't use the jdk encoder that uses 4 bytes). I verified this by testing the provided example with the emojis. It works just fine with the current encoder and decoder.

So you could actually argue that the bug here is not with the modified encoding/decoding itself as there's nothing saying we can't do it that way.

Going back to it being backwards...what if the bug is actually in STOMP? We've been assuming the bug is the encoder on the OpenWire side but in reality the more I think about it the ACTUAL bug seems to be that STOMP uses the JDK encoder/decoder and not the modified version. I think the MUCH simpler solution here is we leave everything as it is since it works and we update the STOMP protocol that uses the JDK 8 and normal UTF-8 encoding to instead using the modified encoding utility which should solve the entire issue.

Unless the modified encoder/decoder would actually result in not being able to decode correctly (which seems to not be the case as long as they match) I think it's fine to keep it and just fix STOMP.

Thoughts?

cshannon avatar Aug 27 '24 23:08 cshannon

@cshannon good catch on the tight marshaller. Moving the encoding logic to a UTF8Util or similar probably a good step to catch these locations and centralize. We could then include the legacy handling as an overload method.

Draft steps:

  1. Create new OpenWire version with support for full length utf8 strings
  2. Update existing OpenWire code generation out-of-tree using JDK 8 and ActiveMQ 5
  3. Update WireFormat to support the optional flag to perform legacy utf8 handling

Documentation notes:

For applications that need old and new handling with the new OpenWire version in the same broker, they would need to configure separate transport connectors. One with the legacy utf8 encoding configured via wireformat config flag and another one that supports the new option. Older OpenWire versions just 'get' the old utf8 string handling.

mattrpav avatar Aug 27 '24 23:08 mattrpav

To illustrate my last point about fixing stomp, here is a quick proof of concept in a branch that just modifies the Stomp frame to decode using the write utility (we would need to modify the parts of the code that encode elsewhere).

https://github.com/cshannon/activemq/commit/2885b713cd58ceb7c2543629112e809bef0a3490

Edit: The translation is done in JmsFrameTranslator and LegacyFrameTranslator classes so those would need to be analyzed for this so what needs to be done.

cshannon avatar Aug 27 '24 23:08 cshannon

One last thought on changing STOMP... I haven't looked at the exact STOMP translator in detail but I should point out that we would need to make sure we use the correct encoding depending on which direction things are going. A normal STOMP client would use regular UTF-8 encoding.

So I'm wondering if this would cause extra overhead but it needs to be investigated. When sending messages to a STOMP client, we would need to decode using the ActiveMQ modified logic and then re-encode using the JDK UTF-8 version. And when receiving from a client we would need decode using the JDK standard UTF 8 version and re-encode back into bytes using the modified ActiveMQ logic so this could add an extra processing step depending on how this works. But we may not care for STOMP anyways.

cshannon avatar Aug 28 '24 00:08 cshannon

I took a look a bit more today and I think I identified the source of the bugs:

Bug with Stomp -> AMQ: https://github.com/apache/activemq/blob/e45ee4aae5b49986750d5a2feb171f36cf432fca/activemq-stomp/src/main/java/org/apache/activemq/transport/stomp/LegacyFrameTranslator.java#L54-L65

Edit: AMQ -> Stomp is broken in one case and works in another. It works if compressed as it translates back to text already in one case but is broken in another case when not compressed.

In my proof of concept, it changes getBody() in stomp frame as I just tried something quickly but that is not the right spot to change it. By the time we set data on a StompFrame, we should make sure that it is proper UTF-8 encoding, so that original getBody() method is fine.

The actual issue is just making sure that we are always doing the following when converting:

Stomp Frame -> decode UTF-8 to String -> encode with modified UTF-8 -> set on ActiveMQ message
ActiveMQ message -> decode modified UTF-8 to String -> encode with standard UTF-8 -> set on Stomp frame

In the example above, it's wrong because it's trying to take a short cut (probably for performance) by using the already existing encoding when creating the AMQ message when sending using Stomp and setting the content body. To fix it, we need to make sure we always go back to a string first and re-encode.

We could still potentially try and change the encoding for future OpenWire versions, but I think it's tricky because the OpenWire objects themselves like ActiveMQTextMessage, handle the encoding/decoding of the data by calling off to the utility. So I'm not sure how that would work because the objects are not tied to a specific open wire version, that's independent of the encoding so it would not know which version of the encoder to use.

cshannon avatar Aug 28 '24 13:08 cshannon

I opened up a PR which I think is a better solution, at least in the short term. https://github.com/apache/activemq/pull/1290

i think if we want to change the encoding format to standard UTF-8 that could still happen but should be done as a separate issue and would need to be part as an OpenWire version upgrade and is a much larger effort so may not be worth it.

cshannon avatar Aug 28 '24 17:08 cshannon

AMQ-8398 has been resolved and closed after merging #1290

If you still want to pursue this, then a new Jira should be open as this is a much more complex issue. But honestly I'm not really inclined to make any changes at this point unless we can show there's an issue with the encoding/decoding because of how wide spread the existing logic is.

cshannon avatar Aug 28 '24 19:08 cshannon

Thanks @cshannon!

I gave a round of testing with your fix. It seems to be working fine:

./bin/activemq consumer --destination queue://test --messageCount 1                                    
INFO: Loading '/Users/alekseizotov/opt/apache-activemq-5.18.5//bin/env'
INFO: Using java '/Users/alekseizotov/opt/jdk/current/Contents/Home/bin/java'
ACTIVEMQ_HOME: /Users/alekseizotov/opt/apache-activemq-5.18.5
ACTIVEMQ_BASE: /Users/alekseizotov/opt/apache-activemq-5.18.5
ACTIVEMQ_CONF: /Users/alekseizotov/opt/apache-activemq-5.18.5/conf
ACTIVEMQ_DATA: /Users/alekseizotov/opt/apache-activemq-5.18.5/data
 INFO | Connecting to URL: failover://tcp://localhost:61616 as user: null
 INFO | Consuming queue://test
 INFO | Sleeping between receives 0 ms
 INFO | Running 1 parallel threads
 INFO | Successfully connected to tcp://localhost:61616
 INFO | consumer-1 wait until 1 messages are consumed
 INFO | consumer-1 Received 🙃🙂
 INFO | consumer-1 Consumed: 1 messages
 INFO | consumer-1 Consumer thread finished

The current UTF-8 encoding/decoding is strictly speaking incorrect, however, since it is consistent and encapsulated on server side (not exposed to clients), I do not see any real harm of having it. So, I totally support the idea of https://github.com/apache/activemq/pull/1290 fix. Thank you for doing that!

azotcsit avatar Aug 31 '24 16:08 azotcsit

The current UTF-8 encoding/decoding is strictly speaking incorrect, however, since it is consistent and encapsulated on server side (not exposed to clients), I do not see any real harm of having it. So, I totally support the idea of #1290 fix. Thank you for doing that!

Yeah, it's definitely incorrect in terms of proper UTF-8 encoding but it still works fine as pointed out since it's encapsulated and it would be a major breaking change that would affect every OpenWire client that has been released for the last however many (probably 20+) years.

I'm not against changing the encoding, but it just has to be done in a major version and a new OpenWire version. So it might be something to target for AMQ 7.x. I guess it could be done earlier if we create a new OpenWire version but seems better to just target for a major version because it's working fine as it is.

The super annoying thing is that we will have to keep support for the existing non-standard/incorrect encoding no matter what in order to support older OpenWire clients if we do decide to change it.

cshannon avatar Sep 03 '24 14:09 cshannon