bacpypes icon indicating copy to clipboard operation
bacpypes copied to clipboard

Script stops responding after a very short time

Open sedlacekdavid opened this issue 4 years ago • 14 comments

Hi Guys, have a question regarding my bacpypes script which is working great so far, except one thing.

After a few minutes (5 is usually enough), the script completely stops responding to every incoming Confirmed COVs. The final result is that the device sends a COV and after a few retries it gives up since it doesn't receive acks back. And in the script, I see no reaction from bacpypes - no COV confirmations back to the network nor COV callbacks aren't firing up, although I can see that it's spinning in run() - it has it's own thread.

It behaves just like the UDP socket is dead??

Could you please suggest what to check? I can run it in debug, so I can trace/check anything needed.

Version 0.18.1

obrazek

sedlacekdavid avatar Jun 24 '20 05:06 sedlacekdavid

I'm still investigating it, but no success. It really seems like bacpypes from some reason starts ignoring incoming socket data. Trying to find a place where I can check incoming payload...

Will test also with unconfirmed COVs, although I'd like to use Confirmed COVs.

sedlacekdavid avatar Jun 24 '20 17:06 sedlacekdavid

Same issue with Unconfirmed COV notification. It's not visible from wireshark output below, but I'm not receiving any COV callbacks after ~ 2 minutes...

obrazek

sedlacekdavid avatar Jun 24 '20 17:06 sedlacekdavid

There are few things to try:

  1. Run the server with debugging turned on with something like --debug bacpypes.service.cov:snork.txt that directs debugging traffic to an external file, this shouldn't be a horribly large text file. You should see calls to do_SubscribeCOVRequest() that line up with the source address of the incoming requests you see in Wireshark. If you don't see the simple ack going out that will be a clue.

  2. Run with --debug bacpypes.core:snork.txt and see if there is an exception matching "an error has occurred: %s", err which is a catch-all, but you think that the core run() function is continuing to do its thing, so that might not help.

  3. If your application is using threads, there is quite often a problem with data in one thread that is not available to the others (a.k.a. thread local) and combinations of requests and responses can get lost. BACpypes is not thread-safe, and the iocb module is designed to help bundle a request, response, and an completion event flag together.

JoelBender avatar Jun 25 '20 00:06 JoelBender

Re-reading this leads me to believe that you are have trouble on the client side, not the server side, which is different but suffers from many of the same threading issues. So at some point you have defined do_ConfirmedCOVNotificationRequest() for your application so that it can process incoming COV notifications (and no doubt do_UnconfirmedCOVNotificationRequest() as well) so the same kinds of checks that I described in (2) and (3) are still applicable, but the focus changes from the bacpypes.service.cov module back to your application.

The notification request is being delivered to your application from the thread that is running bacpypes.core.run() so I recommend transferring the relevant APDU contents to your application through a queue and allowing the SimpleAckPDU() to be returned as usual, still in the BACpypes core thread.

JoelBender avatar Jun 25 '20 04:06 JoelBender

Hi @JoelBender !

Thank you for your response. It's exactly as you described. I have do_ConfirmedCOVNotificationRequest & do_ConfirmedCOVNotificationRequest and I can see that they are called after startup - until it happens - it stops reposponding. After that, there is no COV activity at all.

Everything I receive in the callbacks is queued and I'm also using deffered() for passing "commands" to Bacpypes thread - here basically just a send_subscription method which also uses deffered() for request_io().

I've just realised one thing. When COV mechanism stops working, I can still issue reads/writes correctly and I'm getting correct responses!

I've tried --debug bacpypes.core:snork.txt

but it didn't produce any error message when it is stuck and receiving COVs.

Also tried --debug bacpypes.service.cov:snork.txt but it didn't produce a single debug message.....

Maybe moving bacpypes from a dedicated thread to the main thread would help?

sedlacekdavid avatar Jun 25 '20 16:06 sedlacekdavid

Another observation - after I issue a read/write property command, COV starts working again.

Not sure for how long, I'm just going to scrutinise that.

sedlacekdavid avatar Jun 25 '20 16:06 sedlacekdavid

BTW - I didn't mention it - the way it works is that I have BIPSimpleApplication I'm just constructing iocb with SubscribeCOVRequest(), it is sent to the device, device acks the response and that's it.

Then I'm just sitting and waiting for COVs. And I'm refreshing subscription from time to time (for testing I use always at least an hour or so).

I followed your example here: https://gist.github.com/JoelBender/8bc33c0c0d0025e7dd30615846e01008

sedlacekdavid avatar Jun 25 '20 17:06 sedlacekdavid

I've temporarily solved this by issuing periodically a wothless Whois APDU, although it's really not solving the root cause, it works : ))

If you or anybody has an idea what could be wrong.... I'd be glad to solve it and know what's the issue...

sedlacekdavid avatar Jun 25 '20 18:06 sedlacekdavid

WHAT!? That's bonkers! Did you switch foreground/background threads? Can you change the application to non-threaded? I will try to reproduce this, but fair warning, I'm testing a clone/copy/derivative of my COV client with my COV server, and it will be non-threaded.

Debugging bacpypes.service.cov didn't generate anything because that's server side, not client side.

This line that calls request_io() directly should really be deferred(this_application.request_io, iocb) but this isn't a threaded application so it shouldn't make any difference. If you are calling it directly like this sample but not from the same thread as the core is running then that will be a problem and you definitely want to change it to the deferred version. That will put the function call into the list of things the core calls when it is done will the asyncore.loop() call and the requests and responses will be lined up correctly.

JoelBender avatar Jun 25 '20 23:06 JoelBender

@JoelBender not sure what do you mean by foreground/background threads? How can I check that? I have a Thread instance and it spins the run() function.

Switching to non-threaded app would be difficult for me because I don't know fully know bacpypes and I need to keep also other functionalities in the app like MQTT client thread and COV thread - so I'm not sure if I can achieve this with just a single thread dedicated to bacpypes.

The app is a bridge between MQTT and Bacnet UDP/IP world.

I've checked what you've suggested and I'm using request_io() only with deferred()

obrazek

sedlacekdavid avatar Jun 27 '20 05:06 sedlacekdavid

Btw I'm currently sending an APDU (Whois) every minute - with this "hack" it still works 100 %, it's been running for ~2 days so far and without any issues.

sedlacekdavid avatar Jun 27 '20 05:06 sedlacekdavid

Just a thought @sedlacekdavid and @JoelBender , in the gist, lifetime is 60... is this possible that you left to lifetime at 60 seconds... so the COV just die because, well... you told it to ?

ChristianTremblay avatar Sep 15 '20 01:09 ChristianTremblay