Sharp7 icon indicating copy to clipboard operation
Sharp7 copied to clipboard

Flushing data from TCP after read error

Open gambgia opened this issue 5 years ago • 12 comments

Hi, I'm testing a connection with a S71500 PLC via VPN. I'm reading different DB Areas from different DB. Everything run fine. Sometime, with the Internet traffic, one of the read fails with Tcp Read Data error and usually some data are still to be read (maybe data arrive later). The Flushing included in method WaitForData is performed before the next read, but if TCPSocket.Available is less than the next read data size, the flush will not perform. Is it right ? In this case I will receive, within the next read, wrong data. Thank you

gambgia avatar Feb 14 '19 14:02 gambgia

Hi! Thank you for reporting this issue!

Can you please tell me if or how you can reproduce this issue?

fbarresi avatar Mar 08 '19 18:03 fbarresi

Hi, I think I'm able to reproduce the issue. I can give you some more details. The PLC actually, controls an automatic machine. My application is a supervisor/controller of that machine. The app performs a frequent polling reads of different DB areas. I'm using S7MultiVar reader to read small db areas (due to negotiated PDU size limit, but faster than DBRead) and single read of bigger areas (using DBRead). When I start a new instance of my application and connect it to the plant using a VPN connection sometime the reads return errors. Maybe on S7Multivar read or DBRead. In this moments I'm reproducing the situation. I have some errors on DBRead (Error 0x00000005 TCP : Error receiving Data). If I stop execution on the error and I perform a loop to flush TCP until receive timeout is expired, I see something like:

Flushing loop start SizeAvail: 487 Flushing... SizeAvail: 0 Flushing loop end End Flushing

Without this procedure, at the next polling, I will read wrong data. For example, if I'm right, you will flush 7 bytes with the RecvIsoPacket() and then the size of the next read. If the next read is of 100 bytes, you will flush 107 bytes total. and the other 380 bytes are received within the next read that in my example is a read on another DB. This means wrong data. In this moment I have implemented a workaround with a flushing loop of TCPSocket.Available bytes but only in caso of read error. Thank you.

gambgia avatar Mar 11 '19 10:03 gambgia

Hi! Can you please share the workaround you developed? I would try to integrate it in the library. Thank you!

fbarresi avatar Mar 17 '19 21:03 fbarresi

Hi, My workaround is a simple copy of your WaitForData (MsgSocket.cs) made public and renamed to "public int Flush(int Size)". The Timeout parameter is replaced with _ReadTimeout. Then I added a method in S7Client.cs: public int Flush(int Size) { return Socket.Flush(Size); }

Everytime I have a errTCPDataReceive I will call the Flush asking to flush 2048 bytes:

byte[] buffer = new byte[mds.DataStoreLength]; DbRead(mds.DataStoreNumber, mds.DataStoreOffset, mds.DataStoreLength, ref buffer); mds.DataStoreBuffer = buffer; if (_operationResult.IsOk) { OnPollingTelegramsCompleted(new PollingCompletedEventArgs(mds, _uniqueId)); } else { System.Diagnostics.Debug.WriteLine("DataStoreTelegramsList - Error: " + operationResult.ErrorMessage); _client.Flush(2048); }

You should not be able to call another tcp read while the flushing loop is running.

When the error is in the S7Multivar read, usually the problem is that I'm requesting 3 DB blocks but in the result I have only one data block. The other datablock arrives later and with the workaround I throw all away. It's better to have less data but right, instead of more data but wrong (In a polling scenario).

Obviously the changes should be re-engineered. Thank you

gambgia avatar Mar 18 '19 14:03 gambgia

Close due to inactivity

fbarresi avatar Jun 02 '20 18:06 fbarresi

I am having the same problem. Apart from @gambgia solution, just closing (Close()) the socket when the connection fails ( LastError = S7Consts.errTCPDataReceive;) seems to help. The following is an example, not a solution:

#public int Receive(byte[] Buffer, int Start, int Size)
        {

            int BytesRead = 0;
            LastError = WaitForData(Size, _ReadTimeout);
            if (LastError == 0)
            {
                try
                {
                    BytesRead = TCPSocket.Receive(Buffer, Start, Size, SocketFlags.None);
                }
                catch
                {
                    LastError = S7Consts.errTCPDataReceive;
                }
                if (BytesRead == 0) // Connection Reset by the peer
                {
                    LastError = S7Consts.errTCPDataReceive;
                    Close();
                }
            }
  // CLOSING THE SOCKET ON TCP ERROR:
		    if (LastError == 5)
		    {
		        Close();
                    }
            return LastError;
        }

Any ideas on how this can be solved? Would just closing the socket work or are there any side effects?

Thanks.

josbri avatar Jan 13 '21 11:01 josbri

Hi! Please apologize my late reply. No, Closing the socket would no cause any side effects.

Do you use the sharp7 with parallel requests? It seems to me, that this errors are caused by faulty data, but the only scenario I can imagine for that is that the plc would send data to the wrong request.

fbarresi avatar Jan 23 '21 19:01 fbarresi

Hi, I apologize for the late reply as well. We are not doing any parallell requests, but the problem seems to go away when only using one DB block. So far it looks exactly like what @gambgia describes. In the WaitforData method, the SizeAvail can be 0 at the time of the if (Expired && SizeAvail > 0), yet have the remaining data avaible just after.

sizeAvail

In this image you can see an example of what we are refering to. In this moment there is no connection, it didn´t go inside the if(Expired && (SizeAvail > 0)) in line 99 so the sizeAvail was 0 at the moment, but it did go into the Expired. Once inside if we inspect the TCPSocket.Available, we can see that there is actually data in there (89).

If the connection is recovered before the next read, the socket will never close and this data wil still be in the buffer.

Hope all this makes sense.

josbri avatar Jan 27 '21 14:01 josbri

Hi, Thank you for your post. I think I now see the real problem. Please, help me to check if I understood you well:

  • For some reason (VPN or whatever) the connection get instable
  • the user want to receive some data (let say 42 bytes) from the plc, so the receive function is called (Line 124 of MsgSocket.cs)
  • In this function the method WaitforData is called In this method the software waits till the number of bytes available reach the size (42) or the time is out. In case of timeout every partial message will be trashed.

In your case there are still data after the timeout was recognized that will be trashed on the next reading. So I think a close is not needed in this case and the remaining data in the buffer will not effect further readings.

fbarresi avatar Jan 28 '21 23:01 fbarresi

Yes, that´s more or less what is happening. It appears that the data is not being trashed though.

josbri avatar Feb 02 '21 17:02 josbri

Do you mean, that the data are not trashed in your debug-example? That's right, because they are going to be trashed at the next read operation.

fbarresi avatar Feb 02 '21 17:02 fbarresi

Hi, I am experiencing this issue as well. We are running an application on a wireless device that connects to an S7-1500 PLC over WiFi. In some parts of the plant, the WiFi connection is weak and the connection slow causing data read errors. In contrast to a single error that resolves itself on the next data read operation (due to a lost connection), the data is continuously incorrect due to data arriving after the request has timed out (due to a slow connection).

Our application polls the PLC for data periodically across multiple reads. Every polling cycle looks like this:

  1. Read global data of a machine (read 1)
  2. Read data of component 1 (read 2)
  3. Read data of component 2
  4. Read data of component 3
  5. Read data of component 4
  6. Read data of component 5

When a read fails and the connection is restored afterwards, we observe that the result data of read 1 (Read global data of a machine) is returned upon calling read 2 (Read data of component 1).

For example, say the first data of read 1 and read 2 are both Strings. If read 1 is supposed to return string "Global data" and read 2 is supposed to return "Component 1 data", we instead see read 1 report a data read error, and read 2 return "Global data". The data of the data read operation is thus shifted to the next data read operation.

The data seems to shift from one data read operation to the next. It does not seem to be a shift of data within a single data operation. This could explain why @josbri doesn't observe the problem when reading from only a single DB.

What I assume is happening is the following. Currently, Sharp7 will flush any incomplete data in the buffer upon detection a timeout. It is however possible that more data is still on the way after this initial timeout detection. In case of a very slow connection, no data may have been received yet by the Sharp7 client upon triggering the timeout detection. All data will arrive after the flush of the socket. The next time a data read is executed, the data of the previous call will be available in buffer. This data will have all the correct S7Protocol headers, so it will pass as a valid message. It contains the response to the previous read request however, instead of the current read request. This again explains why the issue is not observed when only reading from a single DB in the polling cycle, as the data will be structurally correct, it is however data from one cycle ago (which may not be an issue in a polling scenario).

Would it be an option to flush the TCPSocket buffer at the start of every call to WaitForData? This would ensure it is empty before waiting for new data to arrive. Better yet may be to flush the buffer upon sending the data read request to the PLC (otherwise data may arrive inbetween the data read request and the call to WaitForData).

Our software engineers on-site have reported the issue is resolved by restarting our application. In case of a buffer issue, I would indeed expect this to solve the issue. We have currently deployed a quick fix that destroys the S7Client instance and creates a new instance upon encountering any data read error, but this is obviously a hacky solution. @josbri 's solution of closing the socket is a bit cleaner, and will also clear the buffer and make sure no more on-the-way data arrives in the buffer. @gambgia 's solution is the cleanest, but on-the-way data may still end up in the buffer as the socket is not closed, so the next data read may still be incorrect.

Could you advise on your view of this issue? A solution integrated into the Sharp7 library may be the cleanest. It would have to ensure the buffer is clean before any data read, and that no more data from a previous data read can enter the buffer.

JellevanCampen avatar Jun 07 '21 14:06 JellevanCampen