libmodbus icon indicating copy to clipboard operation
libmodbus copied to clipboard

Multiple processes running concurrently using libmodbus over Modbus-Rtu

Open hfelek opened this issue 1 year ago • 5 comments

libmodbus version

 3.1.10

OS and/or distribution

 Raspberrry Pi 4 , Raspberry OS Linux

Environment

  Architecture:                    armv7l
  Byte Order:                      Little Endian
  CPU(s):                          4
  Vendor ID:                       ARM
  Model name:                      Cortex-A72
  CPU max MHz:                     1500.0000
  CPU min MHz:                     600.0000

Description

 Two processes are created to scan two devices over two UART ports of RaspberryPi 4 using modbus-rtu with RS485 connection. 

In both of the processes libmodbus library is used. When either of the processes runs by itself, I can make read-write operations without any problem. However when running two processes are run concurrently(with 2 seconds delay), the process started later on can't execute long register operations(120 registers , 2 bytes each) . "Connection timed out" error is returned from modbus_read_registers or modbus_write_registers. When I dig into the related functions, I realized Src/modbus-rtu.c -> _modbus_rtu_select function can not handle to let make operations on the related file descriptors. Changing the byte timeout and response timeouts just delays the time error is returned. and still either of the processes doesn't function properly. These two processes are scanning the devices with baudrate of 115200. When I change the baudtate for one of the devices to 2000000, one process still functions properly, the other process can read long register however on write operations(regardless of the register length ) error ratio becomes %50 on a continuous operation cycle.

I haven't seen any previous issues using libmodbus on concurrently running two processes. I have been trying to figure out the issue or the solution for a week but I couldn't move forward more. I can give more details in case of any questions but issue seems to be software related but not hardware. To mention when tcp/ip is used in either or both of the processes, no issue is observed.

libmodbus output with debug mode enabled

For the process started later on. Reading device registers! [01][03][00][00][00][64][44][21] Waiting for a confirmation... ERROR Connection timed out: select Connection timed out <01><03><C8><01><06><02><00><49><52><2D><4F><33><32><31><30><30><30><30><30><00><00><00><00><00><80><E1><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><00><7C><A4>Devices' identification process has failed!

hfelek avatar Dec 27 '23 07:12 hfelek

You should tell us more about the Modbus server device, the used hardware (RS-232/RS-485...) on the Raspi side, what UART drivers are involved etc. I cannot imagine that this is a libmodbus issue, most probably a timing issue in the whole setup. What timeouts did you use? What timeout is configured on the server device? how long does a usual response takes? Can you provide a full trace of a good-case and a bad one? I cannot really understand your trace above: you request to read 100 registers starting at register address 0. But the response length does not seem to match (3rd byte is 0x01, but should be 0xc8 for 100x 2 byte = 200 byte) - or if the length is assumed to be correct, then the response of the server is just too long.

mhei avatar Dec 27 '23 19:12 mhei

Yes you are right on the details of the issue. I have looked probable sources of the problem in different setups. I also agree on whether the problem is a libmodbus issue but on different OS and hardware behavior of the implementation may vary.

I will write my real setup to give more insight to the issue. My first comment setup was tried to see whether there is a software issue on the processes.

Kernel release version: 5.15.84-v7l+ OS version: Raspbian GNU/Linux 11 (bullseye)

Setup

This is the actual setup I want to use on my final project. There are two programs running on different UART ports(both PL011 UARTs) of RPI4 with baud rates of 115200 and 2000000. I will call the process with 115200 baud rate as 'Process 1' and process with 2000000 baud rate as 'Process 2' to make statements more clear.

Process 1

        "Baudrate": 2000000,
        "Byte Timeout": {
            "sec": 0,
            "usec":50
        },
        "Response Timeout": {
            "sec": 0,
            "usec": 20000
        },

Step 1: Available RS485 slaves are detected. Step 2: Continuous Write-Read Partition in an infinite loop. Write operation to 16 registers with 'modbus_write_registers' and read operation from 1 register.

Process 2

        "Baudrate": 115200,
        "Byte Timeout": {
            "sec": 0,
            "usec":50
        },
        "Response Timeout": {
            "sec": 0,
            "usec": 20000
        },

Baud rate: 115200 Step 1: Available RS485 slaves are already known. 100 - 100 - 26 length read registers operations is executed in three steps. Step 2: Continuous Write-Read Partition in an infinite loop. Write operation to 15 registers with 'modbus_write_registers' and read operation from 1 register.

#Tests

//Single run of Process 1 - Step 2: logs in a continuous loop.
Total number of scan cycles -> 7000
Slot Number: 1, Successful: 7000, Failed: 0 
Duration for 1000 main cycles: 3.121561 seconds
Total number of scan cycles -> 8000
Slot Number: 1, Successful: 8000, Failed: 0 
//Single run of Process 2 - Step 2: logs in a continuous loop.
Duration for 1000 main cycles: 7.027475 seconds
Slot Number: 1, Successful: 2000, Failed: 0 
Duration for 1000 main cycles: 7.039133 seconds
Slot Number: 1, Successful: 3000, Failed: 0 
//Process 1 logs when both process are running 
Total number of scan cycles -> 3000
Slot Number: 1, Successfull: 1268, Failed: 1732 
Duration for 1000 main cycles: 14.515857 seconds
Total number of scan cycles -> 4000
Slot Number: 1, Successful: 1786, Failed: 2214 
Duration for 1000 main cycles: 12.842769 seconds
//Process 2 logs when both process are running 
Duration for 1000 main cycles: 7.081596 seconds
Slot Number: 1, Successful: 32000, Failed: 0 
Duration for 1000 main cycles: 7.081268 seconds
Slot Number: 1, Successful: 33000, Failed: 0 
Duration for 1000 main cycles: 7.033406 seconds
Slot Number: 1, Successful: 34000, Failed: 0 

These logs may be unnecessary but I feel like it may lead to to source of the problem with logs given in other cases. To summarize while each process is running by itself we don't have communication problem but while two processes are running concurrently communication problems start to occur. I tried to read-write different registers sizes and what I've seen as follows for Process 1 -> 16-length register write,1-length register read -> all the errors occur on write operation and read operation functions properly 16-length register write,16-length register read -> errors occur on both r-w operations in similar ratios. 1-length register write,16-length register read -> less errors occur on both r-w operations. Most of the errors occur on read operations. Logs for this case is given below: 'r' and 'w' log means on which operation error occurred.

Duration for 1000 main cycles: 3.587441 seconds
rrwrwrwrwrwrwrrrrrrTotal number of scan cycles -> 43000
Slot Number: 1, Successful: 42816, Failed: 184 
Duration for 1000 main cycles: 3.588082 seconds
rrrrrTotal number of scan cycles -> 44000
Slot Number: 1, Successfull: 43811, Failed: 189 
Duration for 1000 main cycles: 3.537136 seconds
rrrrrrrrrrrTotal number of scan cycles -> 45000
Slot Number: 1, Successful: 44800, Failed: 200 

Error log for Process 1 while both processes are running concurrently

Waiting for a confirmation...
<01><03><02><00><00><B8><44>
[01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]
Waiting for a confirmation...
<01><10><00><0A><00><10><E1><C7>
[01][03][00][1A][00][01][A5][CD]
Waiting for a confirmation...
<01><03><02><00><00><B8><44>
[01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]
Waiting for a confirmation...
ERROR Connection timed out: select
w[01][03][00][1A][00][01][A5][CD]
Waiting for a confirmation...
<01><03><02><00><00><B8><44>
[01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]
Waiting for a confirmation...
ERROR Connection timed out: select
w[01][03][00][1A][00][01][A5][CD]
Waiting for a confirmation...
<01><03><02><00><00><B8><44>
[01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]

Same timeouts may seem odd for different baud rates but I wanted to put with the ones I have tested. Increasing timeout for the process reduces communication errors but 1000 cycles duration increases proportional to timeout with less communication errors.

Mentioning I am not that experienced with Linux side , I think that problem is occurring due to interrupt and process scheduling latencies within OS. I haven't tried my setup on different OS but the problem doesn't seem to be hardware issue. Times calculated for checking related file descriptors may be changed on libmodbus side for this case.

hfelek avatar Dec 28 '23 12:12 hfelek

I know this may sound crazy but, as a test, slow everything down. Adjust all your baud rates down to 19.2 or similar and adjust your timeouts up accordingly.

What this does is reduces/eliminates many hardware/connection/cabling and timing issues. If everything works satisfactorily at a much slower speed, you then know it is probably not a software issue.

Then you can start tweaking the speeds up again to find the point where there is an issue again.

One of your baud rates is 2MB! At this speed, many other issues can creep into the troubleshooting process.

From: Hüseyin @.> Sent: Thursday, December 28, 2023 7:57 AM To: stephane/libmodbus @.> Cc: Subscribed @.***> Subject: Re: [stephane/libmodbus] Multiple processes running concurrently using libmodbus over Modbus-Rtu (Issue #731)

Yes you are right on the details of the issue. I have looked probable sources of the problem in different setups. I also agree on whether the problem is a libmodbus issue but on different OS and hardware behavior of the implementation may vary.

I will write my real setup to give more insight to the issue. My first comment setup was tried to see whether there is a software issue on the processes.

Setup

This is the actual setup I want to use on my final project. There are two programs running on different UART ports(both PL011 UARTs) of RPI4 with baud rates of 115200 and 2000000. I will call the process with 115200 baud rate as 'Process 1' and process with 2000000 baud rate as 'Process 2' to make statements more clear.

Process 1

    "Baudrate": 2000000,
    "Byte Timeout": {
        "sec": 0,
        "usec":50
    },
    "Response Timeout": {
        "sec": 0,
        "usec": 20000
    },

Step 1: Available RS485 slaves are detected. Step 2: Continuous Write-Read Partition in an infinite loop. Write operation to 16 registers with 'modbus_write_registers' and read operation from 1 register.

Process 2

    "Baudrate": 115200,
    "Byte Timeout": {
        "sec": 0,
        "usec":50
    },
    "Response Timeout": {
        "sec": 0,
        "usec": 20000
    },

Baud rate: 115200 Step 1: Available RS485 slaves are already known. 100 - 100 - 26 length read registers operations is executed in three steps. Step 2: Continuous Write-Read Partition in an infinite loop. Write operation to 15 registers with 'modbus_write_registers' and read operation from 1 register.

#Tests

//Single run of Process 1 - Step 2: logs in a continuous loop. Total number of scan cycles -> 7000 Slot Number: 1, Successful: 7000, Failed: 0 Duration for 1000 main cycles: 3.121561 seconds Total number of scan cycles -> 8000 Slot Number: 1, Successful: 8000, Failed: 0 //Single run of Process 2 - Step 2: logs in a continuous loop. Duration for 1000 main cycles: 7.027475 seconds Slot Number: 1, Successful: 2000, Failed: 0 Duration for 1000 main cycles: 7.039133 seconds Slot Number: 1, Successful: 3000, Failed: 0 //Process 1 logs when both process are running Total number of scan cycles -> 3000 Slot Number: 1, Successfull: 1268, Failed: 1732 Duration for 1000 main cycles: 14.515857 seconds Total number of scan cycles -> 4000 Slot Number: 1, Successful: 1786, Failed: 2214 Duration for 1000 main cycles: 12.842769 seconds //Process 2 logs when both process are running Duration for 1000 main cycles: 7.081596 seconds Slot Number: 1, Successful: 32000, Failed: 0 Duration for 1000 main cycles: 7.081268 seconds Slot Number: 1, Successful: 33000, Failed: 0 Duration for 1000 main cycles: 7.033406 seconds Slot Number: 1, Successful: 34000, Failed: 0

These logs may be unnecessary but I feel like it may lead to to source of the problem with logs given in other cases. To summarize while each process is running by itself we don't have communication problem but while two processes are running concurrently communication problems start to occur. I tried to read-write different registers sizes and what I've seen as follows for Process 1 -> 16-length register write,1-length register read -> all the errors occur on write operation and read operation functions properly 16-length register write,16-length register read -> errors occur on both r-w operations in similar ratios. 1-length register write,16-length register read -> less errors occur on both r-w operations. Most of the errors occur on read operations. Logs for this case is given below: 'r' and 'w' log means on which operation error occurred.

Duration for 1000 main cycles: 3.587441 seconds rrwrwrwrwrwrwrrrrrrTotal number of scan cycles -> 43000 Slot Number: 1, Successful: 42816, Failed: 184 Duration for 1000 main cycles: 3.588082 seconds rrrrrTotal number of scan cycles -> 44000 Slot Number: 1, Successfull: 43811, Failed: 189 Duration for 1000 main cycles: 3.537136 seconds rrrrrrrrrrrTotal number of scan cycles -> 45000 Slot Number: 1, Successful: 44800, Failed: 200

Error log for Process 1 while both processes are running concurrently

Waiting for a confirmation... <01><03><02><00><00><B8><44> [01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9] Waiting for a confirmation... <01><10><00><0A><00><10><E1><C7> [01][03][00][1A][00][01][A5][CD] Waiting for a confirmation... <01><03><02><00><00><B8><44> [01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9] Waiting for a confirmation... ERROR Connection timed out: select w[01][03][00][1A][00][01][A5][CD] Waiting for a confirmation... <01><03><02><00><00><B8><44> [01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9] Waiting for a confirmation... ERROR Connection timed out: select w[01][03][00][1A][00][01][A5][CD] Waiting for a confirmation... <01><03><02><00><00><B8><44> [01][10][00][0A][00][10][20][00][01][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][00][92][A9]

Same timeouts may seem odd for different baud rates but I wanted to put with the ones I have tested. Increasing timeout for the process reduces communication errors but 1000 cycles duration increases proportional to timeout with less communication errors.

Mentioning I am not that experienced with Linux side , I think that problem is occurring due to interrupt and process scheduling latencies within OS. I haven't tried my setup on different OS but the problem doesn't seem to be hardware issue. Times calculated for checking related file descriptors may be changed on libmodbus side for this case.

— Reply to this email directly, view it on GitHub https://github.com/stephane/libmodbus/issues/731#issuecomment-1871143786 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFDSALDPLYFIHB5GHZAI43YLVUC3AVCNFSM6AAAAABBEBMBYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZRGE2DGNZYGY . You are receiving this because you are subscribed to this thread. https://github.com/notifications/beacon/AAFDSANVO56POHAOKSK4JW3YLVUC3A5CNFSM6AAAAABBEBMBYKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTPQ5RWU.gif Message ID: @.*** @.***> >

watsocd avatar Dec 28 '23 23:12 watsocd

The timeouts are really tight and with respect to the baudrates, I wonder whether the whole approach with Linux as none-RTOS system makes sense. As already mentioned, I'd also try to slow down. And since you use RS-485, you can try to connect a 3rd observer system to each RS-485 line and let is sniff into the Modbus traffic. So you can at least see whether the Modbus server's replies are correct and complete and so on...

mhei avatar Dec 29 '23 09:12 mhei

To let everyone know, issue is related to RPi4 UART driver. Changes on related chip's driver must be done. There are related discussions on Raspberry forum.

Thank you for the help @mhei .

hfelek avatar Jan 03 '24 07:01 hfelek