pylogix icon indicating copy to clipboard operation
pylogix copied to clipboard

Correct way to maintain a connection to a PLC.

Open Leery2495 opened this issue 2 years ago • 38 comments

Hi, Just a quick question if possible. In the examples provided when connecting to the processor we use with PLC() as comm:, I am writing an application that basically reads tags provided and displays them. The application has connection to multiple PLC's. What is the correct way to store the connection to the PLC so that I can recall it at any time. The reason I want to do this is that opening and closing the connection is resulting in the EXCP 0300 error on my processor cards. Any help is appreciated.

Leery2495 avatar Nov 23 '21 20:11 Leery2495

I seen this post but dont fully understand what it is saying. https://github.com/dmroeder/pylogix/issues/9#issuecomment-333142011

Leery2495 avatar Nov 23 '21 21:11 Leery2495

When you first read or write a tag, the connection is established. If you have not closed the connection (calling close directly or indirectly), you have about a minute and a half to make another read or write before the PLC will drop the connection. If you keep making reads or writes before the PLC closes the connection, the same connection will be used.

That particular exception is the card not handling sheer volume of connections. ENBT's didn't handle flushing connections very well, so if you open/close connections too quickly, it would eventually cause an error in the card. This is what was happening in issue #9

Consider:

import pylogix
for i in range(10000):
    with pylogix.PLC("192.168.1.10") as comm:
        ret = comm.Read('MyTag')

Each iteration of the loop will open a connection, read the tag, close the connection. 10,000 times. This not what you want to do, it would be better to put the loop inside the pylogix context so that only one connection is made:

import pylogix
with pylogix.PLC("192.168.1.10") as comm:
    for i in range(10000):
        ret = comm.Read('MyTag')

dmroeder avatar Nov 23 '21 21:11 dmroeder

When you first read or write a tag, the connection is established. If you have not closed the connection (calling close directly or indirectly), you have about a minute and a half to make another read or write before the PLC will drop the connection. If you keep making reads or writes before the PLC closes the connection, the same connection will be used.

That particular exception is the card not handling sheer volume of connections. ENBT's didn't handle flushing connections very well, so if you open/close connections too quickly, it would eventually cause an error in the card. This is what was happening in issue #9

Consider:

import pylogix
for i in range(10000):
    with pylogix.PLC("192.168.1.10") as comm:
        ret = comm.Read('MyTag')

Each iteration of the loop will open a connection, read the tag, close the connection. 10,000 times. This not what you want to do, it would be better to put the loop inside the pylogix context so that only one connection is made:

import pylogix
with pylogix.PLC("192.168.1.10") as comm:
    for i in range(10000):
        ret = comm.Read('MyTag')

Thanks @dmroeder So because I'm going back and forward between different PLC's. What is the best way of ensuring that the connection is still open? Can I store the comm object in memory somehow so that a different thread can pick it up and use it?

Leery2495 avatar Nov 23 '21 22:11 Leery2495

you have about a minute and a half to make another read or write before the PLC will drop the connection.

Oops. Hit comment by accident.

I have had very poor luck waiting that long. In my testing I see relatively consistent drops after about five seconds. I must be setting up the connection differently than you.

I ended up writing code within my library to automatically close the connection, nicely, after five seconds of idle time and reconnect automatically when read our write are called.

kyle-github avatar Nov 23 '21 22:11 kyle-github

you have about a minute and a half to make another read or write before the PLC will drop the connection.

Oops. Hit comment by accident.

I have had very poor luck waiting that long. In my testing I see relatively consistent drops after about five seconds. I must be setting up the connection differently than you.

I ended up writing code within my library to automatically close the connection, nicely, after five seconds of idle time and reconnect automatically when read our write are called.

It might be that not all controllers and/or firmware revisions are equal. I tested on a CompactLogix and the connection was flushed after about a minute and a half. Certainly never experienced 5 seconds, as long as ForwardClose and/or UnregisterSession is not called. Maybe a controller is more aggressive about connections the more connections it currently has. It's been a while since I've looked into this but the connection parameters in the forward open might matter as well.

@Leery2495 I've found that the best result when using threads is for each thread to have it's own instance of pylogix, rather than share it with each thread.

dmroeder avatar Nov 23 '21 23:11 dmroeder

My experience is also that the connections drop at about 5 seconds.

In every application I've built that talks to Rockwell PLCs I always "heartbeat" the PLC if more than 2 seconds have passed with no activity on the connection.

evaldes2015 avatar Nov 23 '21 23:11 evaldes2015

The timeout is set as part of the forward open, so based on the _buildCIPForwardOpen method the timeout should be about 14 seconds.
CIPPriority is 0x0A which means the tick time is 1024ms and CIPTimeoutTicks is 0x0E or 14 ticks, so the timeout values is 1024 * 14 = 14336ms or ~14 seconds. The unconnected send method also uses the same values, so it should be the same for unconnected messages as well.

Any message can work as a heartbeat to maintain the connection, in cases where I've needed it simple things like reading the PLC time or getting the program name is what I've used.

ottowayi avatar Nov 23 '21 23:11 ottowayi

I went with a connection close instead of a keep-alive. I found that my own code either:

  1. Read frequently and thus maintained its own keep-alives.
  2. Read/wrote very infrequently and thus was just hogging PLC connection resources.

I didn't seem to have two many cases in between. A very informal question to my users at the time showed similar results. In the case that reads/writes are done infrequently, the time overhead of setting up a new connection is generally OK.

But perhaps I should add a keep-alive as an option.

@Leery2495 I've found that the best result when using threads is for each thread to have it's own instance of pylogix, rather than share it with each thread.

Does each instance have its own connection? If so, I'd be a bit careful. At least one of my old PLCs only supports 32 connections. The environments my library grew up in were fairly heavily networked and PLCs often had several open connections from other PLCs and other systems. Thus, I try pretty hard to minimize the number of connections. Should make that easier to manage though. Hmm, going to file myself an enhancement ticket.

kyle-github avatar Nov 24 '21 00:11 kyle-github

I'm all for a connection keep alive option for pylogix as well, I think we've had a fair share of the same question around connection timeout. I think we can just read a system tag or clock on a background thread, no?

TheFern2 avatar Nov 24 '21 00:11 TheFern2

I should mention that I did definitely see a case where allowing the PLC to drop the connections itself was a problem. This was with a very old ENBT in a ControlLogix (perhaps they fixed this now) and it took so long for the connection resources to get cleaned up in the ENBT that I was able to run out of connection resources simply by waiting 30 seconds and reconnecting without explicitly disconnecting. I did that repeatedly just to see what would happen and the ENBT locked up. I have a ControlLogix with a L81 CPU and that definitely cleans up faster than that.

In general I strongly suggest making sure you are careful about cleaning up as the PLC might not be very fast or efficient.

kyle-github avatar Nov 24 '21 00:11 kyle-github

Thanks for all the input everyone. I should clarify. As it stands i'm forming a connection once every ten seconds, I then read all the tags I need to and close the connection. In my view this should be making one or two connections only in the timeout window. But for some reason my module is still faulting with the EXCP 0300 error. The reads themselves are celery driven as part of a bigger application, therefore each read may be done by any worker. I am doing the same thing on another system with no issues but with this I am getting the fault roughly once every two days. Does anybody know a better way for me to debug this as it sounds like my current method may not be the issue. Thanks again.

Leery2495 avatar Nov 24 '21 00:11 Leery2495

The quick and dirty way would be to create one connection object per plc, and do not call close, if you have a cleanup function like ctrl + c sigint or something along those lines you can close connections there. Then create a keep alive function to read a dummy tag just to keep the connection alive. That should hopefully prevent the error from happening on this ent card.

What code are you currently using?

TheFern2 avatar Nov 24 '21 01:11 TheFern2

This exception is specific to the ENBT, as far as I'm aware. I'm guessing that the other systems that seem to work fine are not 1756-ENBT, or they are of a different firmware revision. Rockwell recommends flashing the module to 6.006 or higher as it addresses issues regarding this exception. The ENBT's web page gives some good information regarding the number of connections, loading and other information. I can help you analyze some screenshots of the page. You can email them to me if you are more comfortable with that.

Edit: I see, exactly what @kyle-github was talking about.

dmroeder avatar Nov 24 '21 03:11 dmroeder

@Leery2495 I found an ENBT to test against. Opening/closing connections doesn't seem to be the problem, opening new connections without closing connections is a bigger problem. Of course sharing instances between threads can be an issue too.

I'd verify in wireshark that you are not accidentally opening a new connection with each read, if you are, and you don't have an easy way to prevent that, then make sure you close it.

A simple example, you can quickly open too many connections by doing something like this:

for i in range(1000):
    comm = pylogix.PLC("192.168.1.9")
    ret = comm.Read("ProductPointer")
    print(ret.Status)

dmroeder avatar Nov 24 '21 23:11 dmroeder

@shared task
def machine_scan():
    processors = Processors.objects.filter(enabled = True)
    for p in processors:
        machines = Machine.objects.filter(machine_processor = p)
        with PLC() as comm:
            comm.IPAddress = p.processor_ipaddress
            comm.ProcessorSlot = p.processor_slot
            if p.processor_routing:
                comm.Route = literal_eval(p.processor_routing)
            for m in machines:
                try:
                    m.scan(comm)
        comm.Close()

Sorry it took so long to get back to you. Only back in the office today. This is the code that is causing me the issues. the m.scan() function is just a list of all the tags I wish to read from each machine and where to put the results so I didnt think including it was relevant. From what you have said I am wondering if because i'm setting the IP route for each connection it is creating a new connection for every read. Perhaps I need something that checks if the comm config is the same and if so doesnt set it again. And to clarify both the card that has no issues and the one that does are same card with same firmware revision so definitely something i'm doing wrong in the code.

Thanks again.

Leery2495 avatar Nov 28 '21 14:11 Leery2495

When you're using the with context the connection should be closed after going out of scope, so comm.Close() isn't needed. I don't see any other item that stands out as being an issue assuming m.scan works and machines are all within the same processor ip, route. However I am not entirely sure about the shared task decorator, is that multiprocessing?

I would test without the decorator, using your same function just try to read a tag here without going out of scope. Then try reading with decorator. If that works then somehow the comm object isn't being shared to m.scan properly.

def machine_scan():
   [snip]
    for m in machines:
        try:
            ret = comm.Read("Some_Tag")
            print(ret.Status)

Another thing you can do is on the scan function is to check if the comm object is equal to None, if it is, then you know for sure this function isn't passing the correct comm object.

TheFern2 avatar Nov 29 '21 13:11 TheFern2

To add to @TheFern2's reply, what is unclear to me is what would happen if machine_scan() was called before a previous call was completed. Your processor object has all the properties for PLC(), maybe a better approach would be to add an instance of PLC() to your processor object instead. Then you might be able to do something like:

@shared task
def machine_scan():
    processors = Processors.objects.filter(enabled = True)
    for p in processors:
        machines = Machine.objects.filter(machine_processor = p)
        for m in machines:
            try:
                m.scan(p.comm)
            except:
                pass

As far as your error and this only happening to the one module, everything I've read about that error says the module is running out of resources. That would take more investigation as to why specifically that one module. Are there other instances of the same module part number working fine? Or is that the only ENBT?

I mentioned before, the two best troubleshooting tools for this will be wireshark and the ENBT's web page. Make sure your ENBT is not running too low on resources. Make sure connections aren't being opened and never closed

dmroeder avatar Nov 30 '21 15:11 dmroeder

Thanks again both. Monitoring the card for the last day or two and haven't had the issue as of yet. Max observed connections being only one at a time. I like the approach of adding a p.comm instance. I'll report back if I have anything further but at this point im starting to suspect that the ENBT card is perhaps faulty as suggested. This is new out of the box so its possible. Im going to try swapping it with one of the known working cards at an opportunistic moment.

Leery2495 avatar Nov 30 '21 18:11 Leery2495

If you're going to swap the card, go for an EN2T if you can. The ENBTs were problematic at best. They were very easy to overload.

evaldes2015 avatar Nov 30 '21 18:11 evaldes2015

It's possible that the card is suspect out of the box. Honestly though, I think they were just never very good at managing resources.

dmroeder avatar Nov 30 '21 18:11 dmroeder

There was this too: https://rockwellautomation.custhelp.com/ci/okcsFattach/get/41204_4

dmroeder avatar Nov 30 '21 18:11 dmroeder

There was this too: https://rockwellautomation.custhelp.com/ci/okcsFattach/get/41204_4

Hi, Just getting in touch to feedback on the issue. I did not go through all of the steps in this bulletin to 100% make sure my device was affected but I did have two dhrio cards in the same rack. Revisions B and C. I upgraded them to revision E two weeks ago and am reading every second so far with no issues. Before doing so this would last a day or two if I was lucky. So i'll give it another month and report back on whether there was anymore issues. Thanks again.

Leery2495 avatar Jan 23 '22 15:01 Leery2495

I also have an EN2T on the way but no stock until May...

Leery2495 avatar Jan 25 '22 00:01 Leery2495

Still having bother but only like once a month. I am currently trying to refactor my application but i'm still not sure what the correct approach is. The idea is I dedicate one celery worker that continues to run as long as connection is available with a heartbeat to ensure connection is still established. Does anyone have any experience with this sort of usage and how I might then queue tasks to the communication worker and wait on response. Really not sure where to go from here. @dmroeder

Leery2495 avatar Mar 13 '22 09:03 Leery2495

It sounds like you need a server handling these requests. You also need to handle the connection exception properly and have some sort of while loop until connection is good again.

Have a look here for advanced error handling

https://youtu.be/ZsvftkbbrR0

Have a look here for an idea https://github.com/TheFern2/pylogix-api I don't have a global maintained connection but you could have one. And then spin up little servers based on how many plc's you have.

TheFern2 avatar Mar 13 '22 19:03 TheFern2

It sounds like you need a server handling these requests. You also need to handle the connection exception properly and have some sort of while loop until connection is good again.

Have a look here for advanced error handling

https://youtu.be/ZsvftkbbrR0

Have a look here for an idea https://github.com/TheFern2/pylogix-api I don't have a global maintained connection but you could have one. And then spin up little servers based on how many plc's you have.

Thank you. I'll take a look.

Leery2495 avatar Mar 13 '22 21:03 Leery2495

I am still confused about how to actually go about keeping the connection open. When you say smaller servers would that mean running another Django application for each processor on a different port and then retrieving values from there. Or would there be a way to maintain the connection within the current server. Either way I think this is beyond me but just not sure how you go about figuring this stuff out. Thanks again.

Leery2495 avatar Mar 14 '22 05:03 Leery2495

You could probably do it with one server, that server could make sure all your plc connection(s) are maintained open, problem with python is that is not async. So if one connection is hung up, it will be blocking code. So whatever you do, you need to have this connection check service on a separate thread, or threads if multiple connections.

PLC1 <> Thread to maintain connection <> Django/Flask server 1 port 5555 PLC2 <> Thread to maintain connection <> Django/Flask server 1 port 5555

If you don't feel like dealing with threads, then yes you'll need to spin up djanjo/fastapi/flask servers for each plc connection, since there's no connection your request should bounce a 4xx http request code until the connection is back up.

PLC1 <> Django/Flask server 1 (Server maintains connection without a thread, if conn is bad, is ok since is one plc conn) port 5555 PLC2 <> Django/Flask server 2 port 5556

At least that how I would do it, others might have better ideas.

TheFern2 avatar Mar 14 '22 17:03 TheFern2

Thanks @TheFern2
I have implemented the threads to manage the servers which seems to be working fine. Any videos on how I actually use the processor comm thread to return results to the main thread. I think this is the part my understanding is really lacking. I really appreciate the help you are giving me.

Leery2495 avatar Mar 14 '22 19:03 Leery2495

Python has locking objects. You could use that to have the comm threads add results to a list or a dict for the main thread to read.

evaldes2015 avatar Mar 14 '22 19:03 evaldes2015