Ethernet
Ethernet copied to clipboard
Socket close function not working in 2.0.0
In order to prevent frozen sockets to lockup network connections, this code here used to work until version 1.1.2 (taken from http://forum.arduino.cc/index.php?topic=291958.msg2040300#msg2040300):
#include <utility/w5100.h>
#include <utility/socket.h>
byte socketStat[MAX_SOCK_NUM];
unsigned long connectTime[MAX_SOCK_NUM];
void loop() {
if(Serial.available()) {
if(Serial.read() == 'r') ShowSockStatus();
}
checkSockStatus();
// rest of your loop code
}
// add these functions
void ShowSockStatus()
{
for (int i = 0; i < MAX_SOCK_NUM; i++) {
Serial.print(F("Socket#"));
Serial.print(i);
uint8_t s = W5100.readSnSR(i);
socketStat[i] = s;
Serial.print(F(":0x"));
Serial.print(s,16);
Serial.print(F(" "));
Serial.print(W5100.readSnPORT(i));
Serial.print(F(" D:"));
uint8_t dip[4];
W5100.readSnDIPR(i, dip);
for (int j=0; j<4; j++) {
Serial.print(dip[j],10);
if (j<3) Serial.print(".");
}
Serial.print(F("("));
Serial.print(W5100.readSnDPORT(i));
Serial.println(F(")"));
}
}
void checkSockStatus()
{
unsigned long thisTime = millis();
for (int i = 0; i < MAX_SOCK_NUM; i++) {
uint8_t s = W5100.readSnSR(i);
if((s == 0x17) || (s == 0x1C)) {
if(thisTime - connectTime[i] > 30000UL) {
Serial.print(F("\r\nSocket frozen: "));
Serial.println(i);
close(i);
}
}
else connectTime[i] = thisTime;
socketStat[i] = W5100.readSnSR(i);
}
}
After adjusting the includes for 2.0.0, the "close(i)" still won't compile because the function is no longer provided for. When I take the 1.1.2 code of "close", the Arduino freezes whereas the same script with library 1.1.2 works perfectly fine.
Is there any way to fix this or at least work around it (without downgrading from version 2.0.0)?
A change from 1.x.x to 2.x.x is a major update. It is normal that code work after upgrade. You have to look into the code. Maybe the function is just renamed.
For more information about versioning read: https://semver.org/
MAJOR version when you make incompatible API changes,
Obviously this is a major update and that's why I asked whether this function will be implemented in the foreseeable future.
I did check the code, the function has not (to my knowledge) just been renamed, but is not provided for. In 1.1.2 it was just a few lines of code and these don't work under 2.0.0, resulting in a freeze.
Closing a socket is not an everyday function, I agree, but also not one that is really extraordinary, so I won't be the only one missing it.
Have you tried EthernetClass::socketClose() or EthernetClass::socketDisconnect()? Both are in socket.cpp. socketCDisconnect() closes the socket by sending the TCP peer a FIN packet and waiting for it to respond with an ACK packet. socketClose() immediately closes the socket without informing the peer..
Ah, I didn't realize the name change - it was just "close()" in version 1.1.2. However, it doesn't make a difference whether I use socketClose or socketDisconnect - both just consist of three lines of code:
SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
W5100.execCmdSn(s, Sock_CLOSE); // socketDisconnect uses Sock_DISCON instead
SPI.endTransaction();
When I remove the second line, the Arduino does not get stuck. If I let it in - either with Sock_CLOSE or Sock_DISCON, it crashes. As I said, with version 1.1.2 it works fine.
One other thing that I noticed: When I tell the Arduino to list all open sockets, I get eight possible sockets, although my W5100-based shield should only support 4, and this is also the number of entries I get when compiling with 1.1.2. I don't know if this is part of the problem, but the last four sockets always contain a non-existing IP number: Socket#0:0x16 80 D:192.168.1.34(55760) Socket#1:0x17 49153 D:192.168.1.5(2323) Socket#2:0x16 80 D:192.168.1.34(55760) Socket#3:0x14 80 D:192.168.1.20(33132) Socket#4:0x1 511 D:25.105.144.192(43009) Socket#5:0x1 511 D:25.105.144.192(43009) Socket#6:0x1 511 D:25.105.144.192(43009) Socket#7:0x1 511 D:25.105.144.192(43009)
What could be the reason that closing or disconnecting a given socket directly causes the program to hang in 2.0.0 and do as it should in 1.1.2?
Interesting. When I first built for my W5500-based shield with this library, I got 4 sockets instead of the 8 it supports. I ended up hard-coding the number in Ethernet.h (see below).
// Configure the maximum number of sockets to support. W5100 chips can have // up to 4 sockets. W5200 & W5500 can have up to 8 sockets. Several bytes // of RAM are used for each socket. Reducing the maximum can save RAM, but // you are limited to fewer simultaneous connections. //#if defined(RAMEND) && defined(RAMSTART) && ((RAMEND - RAMSTART) <= 2048) //#define MAX_SOCK_NUM 4 //#else #define MAX_SOCK_NUM 8 //#endif
You could do the same, only set it to 4. Perhaps your issue is due to the library's attempt to communicate with registers associated with the 4 sockets that don't exist on your shield.
Ok, that might be pointing in the right direction, because EthernetClass::socketBegin has these lines right at the beginning:
#if MAX_SOCK_NUM > 4 if (chip == 51) maxindex = 4; // W5100 chip never supports more than 4 sockets #endif
I just checked, my W5100 reports back as '51', and since MAX_SOCK_NUM is nevertheless '8', these lines of code seem to work around this.
However, even if I limit my search for hanging sockets (the same way actually as socketBegin does it as a last resort in its closemakesocket section) to four, the closeing of any socket still leads to a freeze.
In my experience, when my Arduino "freezes" it's "gone to Coventry", by which I mean the default interrupt handler, because a bug in my code has stomped something on the stack resulting in a function return to an invalid address.
Do you have a hardware debugger you can use to see what's going on? If not, add Serial.print(s); in the socketClose & socketDisconnect functions just before the execCmdSn (maybe followed by a Serial.flush() or a delay to give it time to print before the freeze) to see if you're accessing registers for sockets 4-7. Trying to write to non-existent addresses usually makes bad things happen.
I figured out what caused the "freeze":
When you close a socket, W5100::execCmdSn is called. This function first sends the given command (here: SOCK_CLOSE or SOCK_DISCON) and then waits for the command to complete by doing a
while (readSnCR(s)) ;
As I said, I'm using this as a way to free up sockets in case all four sockets are in use. However, if all sockets somehow are still marked as 0x17 (ESTABLISHED), this while loop will never stop, even if no actual data is being transmitted.
When I change the function by replacing the while loop with a short delay, the Arduino does not freeze. However, the connection also does not get closed.
Does this mean there isn't a way to forcefully close a socket connection?
That check for !W5100.SnCR() is waiting for the command register to be zeroed, which means the 5100 has started processing it. It's not your problem [you're mixing it up with SnSR(s)], though I did change it like this so the library gives the W5x00 done time to work between I/O operations:
do { delayMicroseconds(500); } while (!W5100.SnCR(s));
On Fri, Jan 18, 2019, 6:57 PM fredlcore [email protected] wrote:
I figured out what caused the "freeze": When you close a socket, W5100::execCmdSn is called. This function first sends the given command (here: SOCK_CLOSE or SOCK_DISCON) and then waits for the command to complete by doing a while (readSnCR(s)) ; As I said, I'm using this as a way to free up sockets in case all four sockets are in use. However, if all sockets somehow are still marked as 0x17 (ESTABLISHED), this while loop will never stop, even if no actual data is being transmitted. When I change the function by replacing the while loop with a short delay, the Arduino does not freeze. However, the connection also does not get closed. Does this mean there isn't a way to forcefully close a socket connection?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-455724983, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmII6GloAdAkeEJXB3hBcbaHJ99R8WBks5vEl9kgaJpZM4ZLUX4 .
Sorry, I'm confused, the function I'm talking about is this:
void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete while (readSnCR(s)) ; }
How and where should I apply your changes?
I'm away from my office and working from memory. I meant readSnCR(s).
On Fri, Jan 18, 2019, 7:09 PM fredlcore [email protected] wrote:
Sorry, I'm confused, the function I'm talking about is this: void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete while (readSnCR(s)) ; } How and where should I apply your changes?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-455726828, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmIIw8cAZT0m6SX-c7EbFzpIzFuk8T_ks5vEmIxgaJpZM4ZLUX4 .
Just add a delay between reads of SnCR
On Fri, Jan 18, 2019, 7:09 PM fredlcore [email protected] wrote:
Sorry, I'm confused, the function I'm talking about is this: void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete while (readSnCR(s)) ; } How and where should I apply your changes?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arduino-libraries/Ethernet/issues/82#issuecomment-455726828, or mute the thread https://github.com/notifications/unsubscribe-auth/AlmIIw8cAZT0m6SX-c7EbFzpIzFuk8T_ks5vEmIxgaJpZM4ZLUX4 .
Oh, then I appreciate your efforts even more :)... I'll give it a try tomorrow, it's 1:30am here already, need to get some sleep ;)...
Here you go:
void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete //while (readSnCR(s)) ; do { delayMicroseconds(100); } while (readSnCR(s)); }
I'm not sure this has anything to do with your problem, but it's a bad idea to flood a peripheral with I/O requests while waiting for it to execute a command. It resolved some EthernetClient.connect() issues I was experiencing; perhaps it will help with yours.
If you don't have a hardware debugger, I'd put Serial.print() statements in W5100.execCmdSn() and socketDisconnect() or socketClose() [whichever you're using] and try to determine where your code gets lost. I use both regularly with my W5500 shield and have no problems with them, so I'd focus your search in your code. If you want to post fragments that exhibit the problem I'll be happy to provide a 2nd set of eyes.
Thanks, this is how I did it, basically the same as you suggested, with two log outputs on serial added:
void W5100Class::execCmdSn(SOCKET s, SockCMD _cmd) { // Send command to socket writeSnCR(s, _cmd); // Wait for command to complete // while (readSnCR(s)) ; do { Serial.println("Waiting..."); delayMicroseconds(100); } while (readSnCR(s)); Serial.println("Done."); }
Since this function is called from many others, I can see that it works normally (one or few times "Waiting..." and then "Done.") in all calls except the ones where I want to forcefully close a socket with this code if the socket is in status 0x17 (ESTABLISHED):
SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
W5100.execCmdSn(s, Sock_CLOSE);
SPI.endTransaction();
Here it gets stuck in a seemingly endless loop printing "Waiting...". I don't understand exactly what readSnCR is doing with the socket or what exactly it is waiting for, but apparently readSnCR remains always true if a socket in status 0x17 is given a SOCK_CLOSE or SOCK_DISCON command.
Since it does seem to work if other commands are issued (other than SOCK_CLOSE on a 0x17-status socket, I'm not sure where and how to look further. And I can't find the code for readSnCR anywhere so I could compare Ethernet library version 1.1.2 with 2.0.0 in this respect, because it is working with the older library without a problem.
So if you have any other ideas where and how to address this issue, I'd be really grateful!
Yeah, that puzzled me too when I first looked for it. Someone went far out of their way to write difficult-to-read code.
readSnCR is a macro invocation. The macro is in W5100.h:
#define __SOCKET_REGISTER8(name, address) \
static inline void write##name(SOCKET _s, uint8_t _data) { \
writeSn(_s, address, _data); \
} \
static inline uint8_t read##name(SOCKET _s) { \
return readSn(_s, address); \
}
The Socket n Command Register (SnCR) is defined as follows, also in W5100.h:
__SOCKET_REGISTER8(SnCR, 0x0001) // Command
Together, these give us:
static inline void writeSnCR(SOCKET _s, uint8_t _data) {
writeSn(_s, address, _data);
}
and
static inline uint8_t readSnCR(SOCKET _s) {
return readSn(_s, address);
}
readSn is an inline function defined in W5100.h:
static inline uint8_t readSn(SOCKET s, uint16_t addr) {
return read(CH_BASE() + s * CH_SIZE + addr);
}
Since CH_BASE() returns 0x400 for the W5100 chip and CH_SIZE = 0x0100, readSnCR(s) resolves to:
read(0x400+s*0x100 + 0x0001);
You can find the read() function in W5100.cpp.
Thus,
W5100.execCmdSn(s, Sock_CLOSE);
writes the Sock_CLOSE command, [0x10, defined in W5100.h] to the command register for the nth socket. When the W5100 sees this it automatically clears SnCR as part of processing the command.
The W5100 datasheet can be downloaded at https://www.wiznet.io/wp-content/uploads/wiznethome/Chip/W5100/Document/W5100_Datasheet_v1.2.7.pdf. Page 27 explains how Socket commands work, and section 3.2 (page 16) shows how registers are mapped to memory addresses. You may want to double-check addresses and command values found in W5100.h; I found a few minor discrepancies between the addresses for the W5500 and the defines in W5100.h that prevented setting connection timeouts.
Wow, thanks, this is really extensive and valuable information! I was thinking of first looking into potential differences between the way versions 1.1.2 and 2.0.0 address this matter, only to find out that now after testing it again with 1.1.2, it also gets stuck in this loop when an ESTABLISHED connection is to be CLOSEd, again after all the other calls to this function are completed without any problems. I'll have to see if I just omit closing these status-0x17 sockets or if I manage to find any other reason for this behaviour. In any case, thanks again for your very helpful support.
A status of 0x17 means a TCP connection has been established; the courteous thing to do would be to call sockDisconnect to send a FIN packet. I do this all the time with my W5500 shield using this library. Unless your shield is damaged, I can't imagine any scenario other than a software bug that could cause this. Were you able to confirm that you never try to communicate with a register for a socket number higher than 3?
I have changed SOCK_CLOSE to SOCK_DISCON now, and limited the disconnect of a socket to one with status 0x14 now (and no longer 0x17 and 0x14 where it so far had only "caught" sockets with status 0x17). But the same problem occurs also with those sockets in 0x14 state, both in 1.1.2 and 2.0.0. Since MAX_NUM_SOCKETS is correctly set to 4 in 1.1.2, the socket register should be correct. In 2.0.0 I have additionally limited the socket number to 4, to make sure this does not go above this limit. I will try a different Ethernet shield and hope that the trouble comes from this, otherwise I have no clue how to move forward from here...
If you're not constrained to use a W5100 shield, you may want to consider WizNet's own W5500 shield. I've found that it's less expensive than competing W5500-based products and its SPI bus supports higher clock rates.
Just had an idea; are you calling socketClose() directly in your code, or is it being called when your code calls EthernetClient::stop()? If the latter, are you calling setConnectionTimeout() anywhere? If so, what value are you setting the timeout to?
I don't think that I can call socketClose() directly, becuse it's a private function, isn't it?
That's why I took that code and used it directly, as stated somewhere above:
SPI.beginTransaction(SPI_ETHERNET_SETTINGS); W5100.execCmdSn(s, Sock_CLOSE); // Sock_DISCON also has no posiive effect SPI.endTransaction();
It seems that EthernetClient::stop() seems to work fine because when monitoring commands sent to the execCmdSn function, there are a lot of Sock_CLOSE coming in, as normally a number of connections are getting closed with every http request etc. But if there is a way to call socketClose() directly and it would make a difference from using the above code instead, I'd be happy to hear about it.
Just wanted to chime in for a moment, say I'm watching this thread, even though I won't be able to put engineering time into this library until late-summer time frame.
I know this is asking a lot, but if possible, please consider making test cases for this problem. Ideally, a test case looks like a complete program which can be copied & pasted into the Arduno IDE and run on various boards with different ethernet shields. Whoever runs this needs to also know what to do for the other end of the communication. Whether the remote side should be on the same LAN (low latency) or elsewhere on the Internet (higher packet latency) may also be relevant. The easier it is for anyone to run the test case and reproduce the problem, the better the odds it will get fixed, and future versions of the library won't regress.
EthernetClient::stop() calls Ethernet.socketDisconnect(), then (apparently unnecessarily) calls Ethernet.socketClose() if the socket status hasn't changed to SnSR::CLOSED within the number of milliseconds specified by the library's _timeout variable (settable via SetConnectionTimeout()). I say it's not necessary because the W5500 datasheet notes on page 48 that:
Regardless of ‘TCP server’ or ‘TCP client’, the DISCON command processes the disconnect-process (‘Active close’ or ‘Passive close’).
Active close: it transmits disconnect-request(FIN packet) to the
connected peer
Passive close: When FIN packet is received from peer, a FIN packet is replied back to the peer.
When the disconnect-process is successful (that is, FIN/ACK packet is received successfully), Sn_SR is changed to SOCK_CLOSED. Otherwise, TCPTO occurs (Sn_IR(3)=‘1)= and then Sn_SR is changed
to SOCK_CLOSED.
It seems to me that a DISCON command will ALWAYS result in the socket status being set to closed whether an ACK is received from the other end or not.
I asked whether you call SetConnectionTimeout() because setting it to zero could cause EthernetClient::stop() to hang forever if the socket status never changes to CLOSED for some reason because millis() - start < _timeout will never be true if _timeout = 0. Perhaps SetConnectionTimeout() should not allow a setting of 0, though my reading of the W5500 datasheet suggests this should never happen. I haven't checked the W5100 and W5200 datasheets to see if they behave the same though.
I was just looking at this part of your code:
{
unsigned long thisTime = millis();
for (int i = 0; i < MAX_SOCK_NUM; i++) {
uint8_t s = W5100.readSnSR(i);
if((s == 0x17) || (s == 0x1C)) {
if(thisTime - connectTime[i] > 30000UL) {
Serial.print(F("\r\nSocket frozen: "));
Serial.println(i);
close(i);
}
}
else connectTime[i] = thisTime;
socketStat[i] = W5100.readSnSR(i);
}
}
and realized you don't have your register reads bracketed by SPI begin & end, which sets the CS line.
SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
stat = W5100.readSnSR(s);
SPI.endTransaction();
I suspect the W5100 never gets your read requests.
No, I do - as I wrote above, I have substituted the close (i) with
SPI.beginTransaction(SPI_ETHERNET_SETTINGS);
W5100.execCmdSn(s, Sock_CLOSE); // Sock_DISCON also has no posiive effect
SPI.endTransaction();
As far as I understood things, execCmdSn should then do readSnSR, shouldn't it?
No. execCmdSn only reads Sn_CR, which the W5x00 sets to zero once it notices the command.
My point was that you need SPI.beginTransaction and SPI.endTransaction to bracket your register reads because SPI.beginTransaction sets the CS line that tells the W5x00 you are talking to it, and SPI.endTransaction clears it so the SPI bus can be used by other devices again.
Ah, ok, got it, will try that...
Ok, just tried that, same result, still hangs when waiting for while (readSnCR(s)) to "complete"...
...and regarding the _timeout variable: I haven't set it anywhere in the code using SetConnectionTimeout(), so I assume that shouldn't be a problem either...