DynamixelSDK icon indicating copy to clipboard operation
DynamixelSDK copied to clipboard

Memory usage and heap/stack corruption doing TX messages.

Open KurtE opened this issue 5 years ago • 10 comments

Sorry again not following the issue template.

I am running into a few issues/problems with updating the OpenCM 9.04 code base to current release code here. Also have questions about memory usage.

In particular I am having issue with OpenCM with the function:

void Protocol2PacketHandler::addStuffing(uint8_t *packet)
{
  int i = 0, index = 0;
  int packet_length_in = DXL_MAKEWORD(packet[PKT_LENGTH_L], packet[PKT_LENGTH_H]);
  int packet_length_out = packet_length_in;
  uint8_t temp[TXPACKET_MAX_LEN] = {0};
...

As you see the temp variable is defined on the stack as TXPACKET_MAX_LEN, which right now you have defined as 4K on all boards. Before it was reduced to 2K for OpenCM board... With it as 4K, my calls like:

packetHandler2->write1ByteTxRx(portHandler, i, DXL_X_TORQUE_ENABLE, 0x1, &error);

Are failing with timeout, as the output message was corrupted. Probably stack ran into heap and ...

So at a minimum need to put the 2K back on OpenCM. I still don't like this function working this way. Things that I wonder about here:

a) 2K or 4K is a lot of memory on limited memory boards. So would like to minimize this. at minimum maybe have two functions. One that simply scans to see if the data contains the ff ff fd issue or not and only call stuffing if it does.

b) if addStuffing finds the ff ff fd pattern in the data, it calls realloc. The txPacket method is described in the E-Manual so it could be called by someone who did not allocate their message on the heap, what happens if you then call realloc on their buffer?

c) I really wish that we could minimize the use of the heap. That is I would not expect calls like: write1ByteTxRx likewise 2Bytes 4 bytes and all of the read functions, to not need to allocate buffers. Maybe have a default buffer with a size large enough for these functions be used (including room for any stuffing) and only if the data to be output exceeds this do we allocate a new buffer. Obviously would need to then check after the message is sent that only free the buffer if it is not our static buffer.

d) Also wondering about other places that might call txPacket, if the caller has already setup the data to handle the ff ff fd pattern. That is if the data includes: ff ff fd fd How do you know if that is the actual data or if the caller had already handled the ff ff fd and added their own padding.

Hope some of this makes sense?

KurtE avatar Nov 11 '18 14:11 KurtE

@kijongGil and @OpusK,

As I mentioned in the first posting here. I tried running this current code on the OpenCM9.04 and none of my writes to servos worked. I finally traced it memory corruption.

For the fun of it I started hacking my way through the code, to see if I could do some of the above. Still WIP, up at: https://github.com/KurtE/OpenCM9.04/tree/Dynamixel_SDK-memory_issues Note: This is based on the still pending non-blocking Serial branch... So may need to separate that out.

With the above code, I tried to setup the functions like: write1ByteTxRx to have their own txpacket on the stack, and then call off do the work. To do this I added a helper function that all of these call to do the implementation...

Likewise for the read1byteTxRx like functions. Where again I know how many bytes should be in the RX packet and have a stack item for that. In the Protocol2 mode, added extra space for padding in the read4Bytes... functions as could hit the ff ff fd case. Would not hit this in 1 or 2 byte cases.

Again with the OpenCM9.04 case where there is only something like 20K of memory, which is used for all global variables including things like SerialX TX and RX buffers, plus user variables, plus Stack plus Heap.

So for example with the current released code, when the user does something like:

int Protocol2PacketHandler::read1ByteTxRx(PortHandler *port, uint8_t id, uint16_t address, uint8_t *data, uint8_t *error)

Which calls readTxRx - Which then calls malloc of 4K of memory for the rxpacket of which we are only expecting the packet to take 12 bytes note: the TX packet is on the stack and only take 14 bytes

This then calls txRxPacket which then calls txPacket, which then calls addStuffing, which as I mentioned earlier than has a stack variable of 4K.

So this simple function is taking over 8K of memory between stack and heap when it needs about 26 bytes plus overhead...


With my updated stuff mentioned earlier in this post, I did some of the add stuffing directly in a few of the methods before it calls off to txPacket - but there are issues with this.

That is for example you call txPacket from the workbench and you assume it will do the stuffing. So I need to resolve this. May try to do the add stuffing a different way. I already did in the functions I did that call this. That is do it as a two step process. First do a scan of the bytes to see if and how may issues are in the user buffer (Maybe it should also detect that the issues were resolved by caller, that is detect ff ff fd fd ... Then if there are things to fix and hopefully we detect that we need to resize the buffer (realloc?), We do that and then do the fix of the buffer in place. That is you start off at the end of the old size and copy the bytes down to the end of the new size detecting that we hit the ff ff fd and insert the fd... Should not be difficult and don't need that extra big buffer...

But again this still leaves some issues in the code like: a) not all buffers passed in are from heap, but again we should already have resolved those.

b) What happens if realloc decides to return a different memory location than was passed in? Already have this problem. We update all of the stuff, in addStuffing to the new address, and promptly return to txPacket who will try to write the data out from the old address of txPacket and potentially when the code completes and unwinds we will try to call free on the old memory location...

c) none of the code detects if malloc fails. So it will simply write to NULL pointer.

d) Side note: if you call something like: read1ByteTxRx with id=255, it will call the function readTxRx which will malloc 4K then test if id > BROADCAST_ID and fail leaving the 4K memory...


With my WIP code changes, again I try to size an RX buffer for the size of the RX buffer expected. But what happens if the buffer returned is for some reason bigger than this? There is nothing in rxPacket that knows the size of your actual RX buffer, so likely in this case you would overwrite the end of the buffer. Maybe this is why in the functions like read1byteTxRX it allocates a max size buffer (currently 4K)... Which takes care of this case.

However the functions like: write1ByteTxRx call writeTxRx, which has a stack variable for the RX buffer assuming no parameters will be returned. So this issue is already there.


Probably enough for this comment.

Let me know what you think and if you wish for me to continue along this path?

Thanks Kurt

KurtE avatar Nov 12 '18 15:11 KurtE

@kijongGil @OpusK

On the realloc case, I am wondering if it would make sense if we could remove the realloc, and expect that the caller would add the 1/3 size to their allocations, which either is on the heap or on the stack ... Would remove some of the issues mentioned earlier, but would probably require changes to the Workbench as well...

FYI - here is a WIP version of the addStuffing that does not take any extra memory. It compiles but I have not tested it yet:

void Protocol2PacketHandler::addStuffing(uint8_t *packet)
{
  uint16_t packet_length_in = DXL_MAKEWORD(packet[PKT_LENGTH_L], packet[PKT_LENGTH_H]);
  uint16_t packet_length_out = packet_length_in;
  if (packet_length_in < 5) return; // less CRC so if not at least 3 data bytes won't be issue

  uint16_t length_before_crc = packet_length_in - 2;
  for (uint16_t i = 0; i < length_before_crc; i++)  // except CRC
  {
    if (packet[i+PKT_INSTRUCTION] == 0xFD && packet[i+PKT_INSTRUCTION-1] == 0xFF && packet[i+PKT_INSTRUCTION-2] == 0xFF)
    {
      if ((i == (length_before_crc-1)) || (packet[i+PKT_INSTRUCTION+1] != 0xFD)){
        packet_length_out++;  // check to make sure it was not already stuffed. 
      }
    }
  }  
  if (packet_length_in == packet_length_out) return;  // No new padding added.
  // BUGBUG:: using realloc which could move to new address... ?
  packet = (uint8_t *)realloc(packet, packet_length_out + 7);
  if (packet == NULL)   return; // we have problems!

  // now lets move the data down - won't handle ff ff fd fd here as if any pre stuffed expect fullly done
  uint8_t *packet_stuff_ptr = &packet[packet_length_out + 6];
  uint8_t *packet_pre_stuff_ptr =  &packet[packet_length_in + 6];
  while(packet_stuff_ptr != packet_pre_stuff_ptr)
  {
    if ((packet_pre_stuff_ptr[0] == 0xFD) && (packet_pre_stuff_ptr[-1] == 0xFF) && (packet_pre_stuff_ptr[-2] == 0xFF))
    {
      *packet_stuff_ptr-- = 0xFD; // stuff out new byte;
    }
    *packet_stuff_ptr-- = *packet_pre_stuff_ptr--;  // copy the byte down
  }
  return;
}

KurtE avatar Nov 12 '18 16:11 KurtE

Hi, @KurtE :) Sorry, I am on business trip until this week. I'm busy with other things until this year. So, I am going to update DynamixelSDK in earnest next year. I am always thankful for your contribution.

kijongGil avatar Nov 13 '18 01:11 kijongGil

Hi @kijongGil (and @OpusK ) and others...

I totally understand that you are busy with other things, and probably don't currently have very much time, to work on any of this this year. But wondering about your thoughts for maybe taking in a few Pull Requests in for the SDK, for support for OpenCM, probably OpenCR... Probably involves PRs to all three github projects.

Currently if someone installs the current code in this project in their Arduino/Library folder, it will override the one installed by OpenCM Arduino IDE, and at least my test cases will fail in strange ways as the heap and stack will collide.

As I have mentioned in the OpenCM Issue: https://github.com/ROBOTIS-GIT/OpenCM9.04/issues/54 I have adapted one of my test programs to try to see how much dynamic memory is used. It prints out an estimate (I think off by 4 bytes) of how much space is used by global variables. Then after each of my commands in this test program it prints out estimates of how much heap space is allocated and lowest memory used by the stack and then prints out the difference between the two, to again give an estimate of how much RAM we have left.

Test program up at: https://github.com/KurtE/Open_CM_CR_Arduino_Sketches/blob/master/AX12_Test_OpenCM/AX12_Test_OpenCM.ino

Again if I run my test on the OpenCM9.04 which does a scan of Serial 1 and Serial3 both protocol 1 and protocol 2 for servos, and then issue one command: h (for hold servos - torque enable).

If I run it with current released OpenCM9.04 Arduino builds (SDK not updated), you see:

CM9.04 Servo Test program
Estimated global data size: 12824
starting Heap info: start: 20003218 current: 20003454
Start Stack info: end: 20005000 current: 20004fc8
...
  Begin Protocol 2: 2.00
    1, Model:4241 - 2267
    2, Model:4242 - 2503
    3, Model:4243 - 2048
...
Cmd: h
PrintServo Voltage called
Heap ptr: 20003ea0  Usage: 3208
Stack Max: 20004660, usage: 2464
Estimated unused memory: 1984

So if you use the Released Dynamixel SDK which when merged into OpenCR branch removed the #if that restricted MAX packet read write to 2k instead of 4K. And with current code it therefore needs at least 4K more memory that 1984 won't work...

With my current updated code, which is maybe more than you wish to take at this time: https://github.com/KurtE/OpenCM9.04/tree/dynamixel_sdk_update_memory_reduction

If I rerun that same test


CM9.04 Servo Test program
Estimated global data size: 12376
starting Heap info: start: 20003058 current: 20003294
Start Stack info: end: 20005000 current: 20004fc8
...
  Begin Protocol 2: 2.00
    1, Model:4241 - 2267
    2, Model:4242 - 2503
    3, Model:4243 - 2048
...
Cmd: h
PrintServo Voltage called
Heap ptr: 200033d4  Usage: 892
Stack Max: 20004c10, usage: 1008
Estimated unused memory: 6204

You can see there are some pretty good improvements. But there are still things to go over with this, things like I try not to use the heap for buffers when you do read or write of 1, 2, 4 bytes... Also use calculated size for return packets instead of MAX...

Question is if you don't have time to review this (I have not yet issued PR), should I instead create a simpler PR request that gets us closer.

Things like: a) Put the test for OPENCM904 in (back in the case of OpenCR) to limit MAX in protocol2_packet_handler.cpp

// Before DynamixelSDK release, it needs to apply because of OpenCM904
#if defined(__OPENCM904__)
#define TXPACKET_MAX_LEN    (2*1024)
#define RXPACKET_MAX_LEN    (2*1024)
#else
#define TXPACKET_MAX_LEN    (4*1024)
#define RXPACKET_MAX_LEN    (4*1024)
#endif

b) Update addStuffing - to not need a MAX packet variable on the stack. You can do the update in place. That is you can do two passes. The first is to count how many places you need to insert the 0xFD, then you realloc (which is very problematic) the buffer to the new size.
You can then work from the end of the original size, and copy the data down to the new end location, and when you find a problematic 0xff 0xff 0xfd, you insert the new 0xfd. When your copy from and copy to pointers are the same you know you are done... This saves the 4K or 2K on the stack

c) unsigned short Protocol2PacketHandler::updateCRC - you have the crc_table defined inside of this function, so currently it will allocate 512 bytes on the stack and copy a set of data the compiler put into program space into the stack data. By simply marking the table as static const, This function no longer allocates 512 bytes on the stack and do the memory copy. So both Stack size and code size decreases.

Probably several other things, but those are the easy to find low hanging fruit and are all in one file...

Thoughts?

KurtE avatar Nov 18 '18 16:11 KurtE

@kijongGil, @OpusK

Actually part of the previous posting is incorrect. That is currently you don't have a stand alone Arduino library as part of this project (Wish you did), Also wish you allowed 3rd party boards to be supported... But I asked that earlier in a different issue.

But thought I might try doing some minimal changes like mentioned. I thought I would then try it out on OpenCR board.

First run current develop branch: Same test program.


CM9.04 Servo Test program
Estimated global data size: 49280
starting Heap info: start: 2001c080 current: 2001d000
Start Stack info: end: 20050000 current: 2004ffcc
...
  Begin Protocol 2: 2.00
    2, Model:4242 - 2502
    3, Model:4243 - 2048
  Done
...
Cmd: h
PrintServo Voltage called
Servo: 2 Voltage in 10ths: 122
Heap ptr: 2001f000  Usage: 12160
Stack Max: 2004eee8, usage: 4376
Estimated unused memory: 196328

With these minimal changes on OpenCR
The changes are not as drastic. The initial stuff is the same: But, the stack usage is less:

PrintServo Voltage called
Servo: 2 Voltage in 10ths: 122
Heap ptr: 2001f000  Usage: 12160
Stack Max: 2004fbb8, usage: 1096
Estimated unused memory: 199608

It saves 3380 stack space. No change to heap... Could make more drastic changes again like not have RX allocate 4K buffer, but instead size of actual message plus room for stuffing.

EDIT Note: I issued a PR in the OpenCR project for these changes. Also for the fun of it, I tried a simple change for RX buffers. But instead of always allocating 4K buffer for the RX, I calculate the size needed, and if it is less than 512, I allocate 512 (size for Protocol1 RX MAX)...

int Protocol2PacketHandler::readTxRx(PortHandler *port, uint8_t id, uint16_t address, uint16_t length, uint8_t *data, uint8_t *error)
{
  int result                  = COMM_TX_FAIL;

  uint8_t txpacket[14]        = {0};

  uint16_t rx_length = length + 11 + (length/3);
  if (rx_length < RXPACKET_MIN_LEN) rx_length = RXPACKET_MIN_LEN;

  uint8_t *rxpacket           = (uint8_t *)malloc(rx_length);
  if (!rxpacket) return result;
  //(length + 11 + (length/3));  // (length/3): consider stuffing
...

I then reran the test and you can see the differences...

Cmd: h
PrintServo Voltage called
Heap ptr: 2001d000  Usage: 3968
Stack Max: 2004fbb8, usage: 1096
Estimated unused memory: 207800

Which saved 8K heap size (which is more than I expected... I have not checked in that change,as not sure what you would think.

KurtE avatar Nov 18 '18 19:11 KurtE

Hi, @KurtE.

Thanks for your contribution. Because you changed only one file, A seems to be easier to test.

Now that he's in a very busy situation and he's out on a business trip, I'll explain it well when he comes back.

OpusK avatar Nov 19 '18 00:11 OpusK

Please check this issue in the final version SDK. (v3.7.0)

ROBOTIS-zerom avatar Apr 03 '19 07:04 ROBOTIS-zerom

Hi @OpusK and @ROBOTIS-zerom

I did a quick sync up of the develop branch of this library and I think some parts are addressed in that I believe the addStuffing function no longer has a large buffer on stack, which is great.

But Personally I wish simple operations like Read or Write 1 or 2 bytes would not require memory allocations (like malloc or realloc).

I am not as hard core on this as some people who refuse to use any code on small processors that use a heap... They want to see exactly how much memory their applications will use at compile time.

KurtE avatar Apr 03 '19 14:04 KurtE

Hello everyone, I am using C++ for running MX-28AT(2.0) motors with a U2D2 controller. I am using position control mode to run motors. When am dynamically allocating size for an array while using sync read- write am getting buffer overrun warning and group sync is giving me a heap corruption error. I am a beginner in coding, Can you please tell me how can I overcome this issue.

Raj-Kiran127 avatar Dec 21 '21 16:12 Raj-Kiran127

Hi @Raj-Kiran127

It seems your issue is irrelevant to this thread. I'll create a reference to a new issue thread #543 so that further discussion can be made. Thanks.

ROBOTIS-Will avatar Dec 22 '21 08:12 ROBOTIS-Will