lib60870 icon indicating copy to clipboard operation
lib60870 copied to clipboard

MessageQueue is always "Available" problem

Open aflteam opened this issue 4 years ago • 5 comments

I have a problem about MessageQueue in 104 Simple Server mode.

CS104_Slave_enqueueASDU() function is adding new ASDUs to msg que. And self->entryCounter is incremented.

After ASDU is sent and confirmed (MessageQueue_markAsduAsConfirmed). MessageQue should release the message and decrement entryCounter. But that step is missing. Becuse of that, connectionHandlingThread is always acts as there is isAsduWaiting is true.

Even if MessageQueue_getNextWaitingASDU() function returns NULL, which means there is no data to send anymore, MessageQueue_isAsduAvailable() function returns true. Same beauvoir shown in Debug Print as below: ASDUs in FIFO: 9 (new(size=28/12): 0x39440, first: 0x39360, last: 0x39440 lastInBuf: 0x39440)

ASDUs in FIFO is constantly is increased even if message is sent. And first que pointer is fixed and newer deleted.

aflteam avatar Feb 13 '20 12:02 aflteam

Hey @mzillgith,

I am testing the latest commit. And, ASDU Fifo is now decremented after confirmed. Thanks for improvement.

I created a test case which is working as redundancy server. (examples/cs104_redundancy_server) Two different connection is established with different redundancy group. IP addresses of clients are added as Allowed IP.

While two communication ongoing, they got same amount of ASDU as data. I unplug one of the ethernet cable of clients, wait a little while then i plug it again. So in that time unplugged ASDU Fifo increases and the other one still consumed to zero. After I re-plug the cable and communication recovers, i expect that accumulated ASDU will relase and both received ASDU counters will be same. But that is not the case. Most of ASDU is recovered but some of them are lost in the process.

aflteam avatar Feb 24 '20 08:02 aflteam

Hi @aflteam

Thanks for your feedback.

In redundancy group mode there are separate queues for the different redundancy groups. Depending on the storage capacity of the queue the queue can overflow when the connection is interrupted for some time. In this case messages can be lost.

Can this be the reason for your observed ASDU loss?

mzillgith avatar Mar 04 '20 06:03 mzillgith

Hey @mzillgith,

I controlled the test case again. Que is not filled much messages. You can think of two each connection (red. group) is working properly at the beginning and ques are empty. Then disruption of the cable cause connection timeout. After the connection timeout Que is started increment, when it reaches only 50 Asdu, I plugged the cable again. When comparing one working and disrupted line total Asdus are not same. I think timeout mechanism does that.

If same test is done this way, no problem i seen. One connection is working normally and the other connection is closed with intentionally from master, ASDU que is incremented. After connection is re-establish ,received ASDU numbers will be same.

The diffrence is connection timeout (comm. error) and communication is closed (comm. disconnect).

aflteam avatar Mar 13 '20 05:03 aflteam

Hi all,

I've tested 651691ae8890105911b4fb14c1d26cdb690718f8 as well and it seems to be a good improvement, it reduces CPU usage a bit and it prevents message losses on reconnection.

@aflteam, I've performed similar tests and I've seen that with 651691ae8890105911b4fb14c1d26cdb690718f8 the library will likely transmit some messages that will appear as duplicates to the master. This is caused by the fact that if the connection is dropped for some reason, there queue will likely contain some messages that have been sent by the slave, and possibly also received by the master but not yet confirmed by it with a S frame (or the S frame has been sent by the master but lost due to the connection drop). At reconnection the slave will retransmit the messages that have not been confirmed. The number of retransmitted messages should be less or equal to the value of the k parameter. I did not observe loss of messages so far.

For this reason I think that with current implementation if you try to fill the queue with a given sequence of messages and try to drain it with or without connection drops, the sequence of messages received by the master in the two cases will likely be different. Can this explain the behaviour you are observing?

@mzillgith can you please confirm this and the fact that the retransmission mechanism is compliant with the spec? do you know when the next release with the that change will be available?

Thanks

nicolatimeus avatar May 22 '20 08:05 nicolatimeus

@nicolatimeus Your explanation seems correct to me. The retransmission mechanism is designed to avoid lost messages. But it cannot avoid duplicate messages. When the connection is lost after the message is sent but before the confirmation is received, there is no way for the slave to know if the message was processed by the client and therefore the message remains in the queue and will be resent later when the connection is established again. This is my understanding of the standard.

mzillgith avatar May 27 '20 17:05 mzillgith