ACE_TAO icon indicating copy to clipboard operation
ACE_TAO copied to clipboard

Problems with fragmented GIOP messages

Open j18ter opened this issue 3 years ago • 9 comments

Version

6.5.11/2.5.11

Host machine and operating system

CentOS 7 Linux 3.10.0-1127.13.1.el7.x86_64 on a generic PC with Intel i7 CPU.

Target machine and operating system (if different from host)

Same

Compiler name and version (including patch level)

gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)

The $ACE_ROOT/ace/config.h file

#define ACE_HAS_IPV6 #include <ace/config-linux.h>

The $ACE_ROOT/include/makeinclude/platform_macros.GNU file

include $(ACE_ROOT)/include/makeinclude/platform_linux.GNU

Contents of $ACE_ROOT/bin/MakeProjectCreator/config/default.features

This file does not seem to exist.

AREA/CLASS/EXAMPLE AFFECTED:

A simple variation of the echo example, which returns a string sequence that is slightly bigger than 4MB. The server is started with command line parameters -ORBMaxMessageSize 1048576 to cause GIOP fragmentation.

The problem effects:

It affects execution. In this example the client receives incorrect values, in our more complex application the client encounters CORBA::MARSHAL or CORBA::COMM_FAILURE exceptions.

Synopsis

Returning big sequences from operations can cause client exceptions or incorrect values.

Description

Initially seen with a total returned size of approximately 8 MB, and client got CORBA::MARSHAL, The ``-ORBMaxMessageSize" parameter was not used in this case. Using it to force fragmentation causes simple examples to fail.

Repeat by

A complete example program with MPC and make files can be cloned from Github with this command:

git clone https://github.com/NetAcquire/tao_fragmentation_bug.git

Note: Run the included run_tests.pl script to see the failure. This passes -ORBMaxMessageSize 1048576 to the server. Without this parameter the test succeeds.

Sample fix/ workaround

Tried to debug, but got lost in the GIOP parsing code.

j18ter avatar Aug 18 '20 22:08 j18ter

Thanks for reporting this issue and making a reproducer. Could you attach a client and server logfile with ORB logging 10?

This is very likely a complex issue and it if very unlikely someone has spare time to work on this for free. When you need someone to look at this in detail consider hiring one of the commercial support companies, like Remedy IT for which I work. See https://www.dre.vanderbilt.edu/~schmidt/commercial-support.html for an overview of the companies who provide support

jwillemsen avatar Aug 19 '20 07:08 jwillemsen

Could you make a PR for the new test to place it under https://github.com/DOCGroup/ACE_TAO/tree/master/TAO/tests/GIOP_Fragments? Don't add generated files and generated makefiles

jwillemsen avatar Aug 19 '20 11:08 jwillemsen

Client and server log files are attached. I compressed these because the server log file includes hex dumps of the complete 4 MB reply. At the very end of server_log.txt one can see that the second element of the string sequence with the value "Hello World" ends up in a separate GIOP message/fragment.

server_log.txt.gz client_log.txt.gz

Thanks for reporting this issue and making a reproducer. Could you attach a client and server logfile with ORB logging 10?

This is very likely a complex issue and it if very unlikely someone has spare time to work on this for free. When you need someone to look at this in detail consider hiring one of the commercial support companies, like Remedy IT for which I work. See https://www.dre.vanderbilt.edu/~schmidt/commercial-support.html for an overview of the companies who provide support

j18ter avatar Aug 19 '20 21:08 j18ter

Could you make a PR for the new test to place it under https://github.com/DOCGroup/ACE_TAO/tree/master/TAO/tests/GIOP_Fragments? Don't add generated files and generated makefiles

Done (Pull request #1205). This may need some massaging by somebody who is more familiar with the TAO tree and build system than I am.

j18ter avatar Aug 19 '20 22:08 j18ter

TAO does not seem to break strings across fragments, and in fact, it does not even break up an element of a sequence. This may be the problem. For example, the reproducing test configures a maximum message size of 1 MB, but the first element of the sequence, a 4 MB string, is transmitted as a single GIOP message. This is not only defeating the purpose of fragmentation, but worse, it causes the following problems. Starting with GIOP 1.2, all fragments except the last must be a multiple of 8 bytes in size, padded if necessary. In the test, at the end of the first element of the sequence, there are 7 more bytes to the next 8-byte boundary. According to CDR formatting rules, the 4-byte size field of the next string (second element of the sequence) should now follow. Instead, TAO inserts 7 padding bytes, and starts the next sequence element in a new fragment. The client has no way to know that the 7 bytes, which follow the first sequence element, are padding bytes. It looks for the 4-byte size field of the second element, suitable aligned at a 4-byte boundary, and finds the length to be zero. As far as the client is concerned, it has reached the end of the reply. Depending on the data type of the sequence elements, the consequences can be more serious. Instead of returned the wrong data, the client may encounter inconsistent formatting and throw CORBA::MARSHAL exceptions. If my interpretation of the GIOP standard is correct, TAO must fragment more aggressively. It must use as much of the available space at the end of each fragment as possible, otherwise the client cannot unambiguously determine how many padding bytes there are. It must fragment at the boundary of primitive data types. This includes braking-up strings. If it fragments only at higher levels, e.g., leaving sequence elements unbroken, the padding becomes ambiguous to clients.

j18ter avatar Aug 21 '20 03:08 j18ter

As far as I remember sending GIOP fragmentation support was added as part of the sendfile support and at that moment only a higher level fragmentation support was implemented.

jwillemsen avatar Aug 21 '20 07:08 jwillemsen

Further investigation suggests, that strings specifically care causing issues. Composite data types generated by tao_idl will recursively use the primitive marshaling routines, and as long as these fragment correctly, the composites will too. I have a fix for simple strings (type char without code set translators that might change the length). This fix breaks-up strings, if necessary, to abide by the configured maximum message size, and more importantly, avoids confusing clients with unexpected padding. For more complicated cases, including wide character strings and when code set translators are engaged, I may be able to fall back to a less ambitious solution, which avoids the padding ambiguity by never terminating a fragment before the string, and otherwise using the old approach of not breaking-up the string itself either. Clients should work fine with this, but it can lead to very big fragments. This will have to wait until next week.

j18ter avatar Aug 22 '20 08:08 j18ter

Please extend the unit test you have with various IDL constructs that you are using for testing.

jwillemsen avatar Aug 22 '20 10:08 jwillemsen

Please extend the unit test you have with various IDL constructs that you are using for testing.

Not sure I will have time for this. It turns out that this fragmentation issue was not what caused our problem. Basically, I'm volunteering a fix for what is clearly a bug, but since it is not affecting our application I have limited time to spend on this.

j18ter avatar Sep 12 '20 02:09 j18ter