arduino_uip icon indicating copy to clipboard operation
arduino_uip copied to clipboard

Fix errata12 - Workarounds/Fixes for freezing/stalling/data issues

Open TMRh20 opened this issue 9 years ago • 6 comments

Hi, I've built a related library ( RF24Ethernet ) based on this library, and seem to have discovered some issues that seem more prominent with a lossy radio network, but also seem to occur here. I haven't fully tested every scenario, but I have done extensive testing using an example I created, which downloads large amounts of data from a web server.

The mentioned example (HttpClientTest.ino), replicates the issues noted below, and demonstrates the results of these changes when they are reverted.

Changes:

  1. I found that changing the defines from 512 to #define UIP_CONF_TCP_MSS 511 and #define UIP_CONF_RECEIVE_WINDOW 511 results in much smoother flow of data when receiving from some web servers.
  2. I found that changing the define from 5 to #define UIP_SOCKET_NUMPACKETS 1 resulted in no data loss, even after 70MB+ of received data, where a few bytes or more of the data would be invalid/corrupt otherwise. It would seem to indicate a problem with handling multiple packets per socket.
  3. Per RF24Ethernet testing, I added a simple restart mechanism, to re-open the TCP window at timed intervals if no data has been sent or received on an open connection. This seems to help large downloads from stalling out.
  4. Set the #define MAX_FRAMELEN to equal to MSS + headers + crc so the hardware will drop any payloads larger than the TCP Max Segment Size before they would be passed to software.

Notes:

  1. For testing, I used a large LiPo battery pack, measuring just under 3.7v to ensure an unquestionably stable power supply
  2. The ENC28J60 module was used in combination with an Arduino Mega2560 to provide lots of RAM for testing purposes
  3. At the time of this writing, and successful testing (70MB+ received without errors detected) I am using a standard 220uF capacitor, close to the network module on a breadboard, but am unsure if needed.
  4. Testing was conducted using the included HttpClientTest.ino example, connecting to a local web server (Raspberry Pi), hosting the included nums2.txt file, which contains 100,004 characters (bytes) of data for testing.
  5. Using the default configuration along with either the master or fix_errata12 branch appears to cause errors.
  6. Using this configuration with the master branch seems to reduce/remove stalling, but appears to still result in data loss/corruption.
  7. The combination of these changes along with the changes in fix_errata12 branch seem to result in 0 errors over a large period of testing
  8. Edit to add: The RPi was pinging the Arduino at an interval of 0.5 seconds during the testing as well, which seemed to increase errors when errors occurred I haven't been able to identify the exact cause of some of the symptoms/issues I've noticed with using this example, but these changes seem to work around them all so far.

TMRh20 avatar Mar 10 '15 13:03 TMRh20

Great, thanks for this! I'm glad people are still improving this nice library. I'm using fix_errata12 for ~1410 hours now on a project without any issues, but that project does not have to handle a lot of data. I'll implement your changes asap.

GunterO avatar Mar 10 '15 13:03 GunterO

@GunterO Thanks for testing out these changes, I'm always doing different things, so can rarely keep my devices running that long. Maybe I just need more devices!

TMRh20 avatar Mar 10 '15 13:03 TMRh20

I am still fighting with an issue that my or may not be known. I've gone through many of the posted issues, and haven't found anything specific to this, so...

The issue can be easily be reproduced by using apachebench to load-test the ENC28J60 device running the InteractiveServer.ino example I've included. If I run the command ab -n 30 -c3 http://10.10.1.58/ and continuously interrupt and restart it, the memory pool continuously gets used up, as connections are opened, and dropped before completion. The issue doesn't seem to really occur so much if connections are opened and closed properly.

In the mempool.cpp file, I've added the following lines to the freeBlock() function:

  Serial.print("FHandle: ");
  Serial.println(handle); 

This displays the memhandle of the current blocks being freed. Over time, there are blocks that are not freed, so are no longer usable, and the available pool will run out rather quickly, if the procedure using apachebench is continued for any length of time.

This doesn't appear to be a huge issue, but memory errors do seem to be cropping up over time, but it will cause the device to 'hang' eventually. The hardware continues responding, but software cannot allocate any free blocks, so everything hangs.

.

TMRh20 avatar Mar 14 '15 18:03 TMRh20

This has been a long time in coming, but I was hesitant to even look at this code + UIP again lol. I added some changes regarding errata 12,13,15 which address lat collisions etc, causing problems up to and including hanging of the internal TX logic.

My testing has showed complete reliability so far, and I've been hammering at it with multiple connections, ICMP packets, etc. , and the memory buffers are being cleared enough for things to keep on working regardless.

But there's a catch...

This only works properly if bytes are written individually. For example, instead of sending a string with client.write(buf,len); the string would need to be broken down or handled as an array, and sent via a loop and something like client.write(buf[i]);

If anybody is interested, these changes can be downloaded here

TMRh20 avatar Aug 12 '15 11:08 TMRh20

Thank you for sharing. It's hard for those of us who don't have the expertise to contribute but still want to use this code to make changes like this. I'll take a look at it in the next couple days. In my use cases I really need hard wired solutions over wireless and the ENC28J60 is in a really good price range for my project over the fancier WIZNET ones.

bholmes451 avatar Aug 12 '15 15:08 bholmes451

  • It's hard for those of us who don't have the expertise to contribute but still want to use this code to make changes like this.*

@bholmes451 Haha, I don't really have the expertise either, and it looks like there are a number of problems with these changes, so kind of back to the drawing board...

TMRh20 avatar Aug 13 '15 04:08 TMRh20