freeboard-sk icon indicating copy to clipboard operation
freeboard-sk copied to clipboard

Host resource consumption

Open preeve9534 opened this issue 1 year ago • 12 comments

This is more to document a potential issue and to elicit advice than to definitively flag up a broken-ness.

I frequently make passages of between 4 and 8 hours in modestly trafficked waters. Freeboard performs well, but everything else on my Signal K host slows down over time. I would say race condition (except there is no evident race as such) or memory leak (but I don't know how to check this in node).

The behaviour I note is a slowing down over time of real-time responses from other Signal K plugins. executing alongside Freeboard. We are probably talking at least a one order of magnitude change in the execution speed of other plugins.

The only things I am confident of are:

  1. If Freeboard is not being used (i.e. has no connected clients) then the slow-down of peer plugins never manifests. With Freeboard running, it always manifests.

  2. The problem is not fixed by killing and restarting the Freeboard client.

  3. Occasionally the issue is not fixed by restarting Signal K. My solution in this case is stop using Freeboard for a few hours after which the problem seems to disappear. This makes me wonder if there is some issue that relates to parsing data that is received only occasionally (bad AIS message or some such?).

  4. Nothing helpful in the logs.

Has anyone else experienced this sort of behaviour?

P

preeve9534 avatar Jun 02 '23 14:06 preeve9534

Is Freeboard being used on a separate device (phone or tablet) or via the browser on the Signal K host?

panaaj avatar Jun 03 '23 00:06 panaaj

Via a browser on a separate PC.

preeve9534 avatar Jun 03 '23 10:06 preeve9534

"Slowing down" implies that the server is busy doing something, either busy with handling some inputs or recurring tasks or busy garbage collecting (memory issue).

Both issues should be evident in top: increased CPU usage, possibly combined with increased memory usage.

One potential issue related to a long running client is subscriptions and/or multiple WebSocket connections. Is the number of websocket connections ok? If there are no excessive ws connections one quick fix to this would be if Adrian could produce a Freeboard that reconnects every x minutes. Of course this is not a real fix, but would help to validate that this is indeed ws & Freeboard induced.

Your issue could possibly be replicated with either of the sample data we have and keeping Freeboard open, if the issue manifests within hours rather than days.

tkurki avatar Jun 04 '23 05:06 tkurki

Thank you for the response @tkurki.

Of course, this always manifests when I am busy operating my boat and my response has to be to stop using Freeboard and concentrate on that. I have on a couple of occasions had the opportunity to run top(1), but not observed anything conclusive. I haven't noticed unusually high ws connections when the problem manifests.

Setting up a test regime as you suggest is something I may have time to do when I am in port .

Something I didn't mention in my earlier post is that subjectively I experience significant slowdown as a "triggered event": on many days I don't notice slowdown, then, one day, slowdown will suddenly, it seems, be very evident. My speculation has been that this could be the onset of a system activity like garbage collection or it could be application specific - maybe a consequence of some unanticipated/anomalous input data. In this respect I notice that once the problem is evident restarting Signal K will not fix it - to me this suggests that environment matters like garbage collection and ws processing are likely not the causal factors, but may be consequential.

The stochastic behaviour makes diagnostics a bit more difficult. The only thing I am confident about is that the use of Freeboard is implicated in my system's meltdown even though it may not be the actual causal agent per-se.

preeve9534 avatar Jun 04 '23 06:06 preeve9534

A little more grist. The issue is evident across multiple Signal K installations on multiple platforms with varying architectures and resource availability: CerboGX, Intel-based industrial PC, Raspberry Pi.

preeve9534 avatar Jun 04 '23 06:06 preeve9534

FYI Freeboard makes only one ws connection.

To test breaking and re-making the connection without re-starting either client or server you can try the following....it's not elegant but will do the job.... select History Playback.

This will stop the connection and attempt to make a ws connection to the server playback endpoint. If you do not have InflubDB plugin to provide history playback, Freeboard will fallback to the stream ws connection.

panaaj avatar Jun 04 '23 06:06 panaaj

I've played around a little.

I think there may be some reinforcing indications that my problem is websocket / connection related. Following @panaaj's suggestion of breaking and remaking the ws connection through "History Playback' I get the impression that I can improve peer app performance momentarily, but then it degrades again quite quickly as the ws comes live again. This echoes my experience that killing the Freeboard web-app and restarting it doesn't help in any major way with the peer process issue.

A few questions occur to me:

  1. Is Freeboard under some circumstances making multiple subscriptions to Signal K that the server can't resource? or
  2. Is Signal K is not managing/cleaning up subscription/ws resources appropriately? or
  3. Is Node is not cleaning up regularly enough or efficiently enough?

If anyone can advise on how to investigate this further...

P

preeve9534 avatar Jun 09 '23 09:06 preeve9534

Just my 5 cent... I make my own charts. At the beginning I didn't make the mbtiles databases correct and neglected the "bounds" information in the metadata table. That gave a very similar experience. Everyting grinded to a stop over time. And it was also consistent over different installations of SignalK since I used the same charts.

AndreasSchutz avatar Jun 29 '23 06:06 AndreasSchutz

That's an interesting five cents. Several years ago I ran my own tile server on the boat - I never noticed Signal K slow down, but then I wasn't using it for anything time sensitive and I can't remember if I'd set up bounds in the meta-data. I'm tempted to get my tile server out of the locker and get it going again so that I can look deeper. Thank you @AndreasSchutz.

preeve9534 avatar Jun 30 '23 09:06 preeve9534

It was the very same @tkurki that helped me debugging my problem. It was like one year ago and I have forgotten how we identified the problem. If I really trawl my memory, I think it was visible in charts-plugin, or if it was the number of open charts. Anyhow, the effect was that some component (charts-plugin or FreeBoard) had to browse all the charts all the time. Since the bounds was missing it tricked that component into believing that all charts covered the whole globe. Please help me out here @tkurki .

AndreasSchutz avatar Jun 30 '23 10:06 AndreasSchutz

Can't remember details either.

tkurki avatar Jun 30 '23 11:06 tkurki

I have identified two problem areas, both manifesting with lots of AIS targets, with fixes in the pipeline:

  • https://github.com/SignalK/signalk-server/issues/1717
  • https://github.com/SignalK/signalk-server/issues/1718

@preeve9534 what you can do is add --inspect argument to node to the signalk-server startup script. This allows using Chrome Dev Tools to generate heapdumps as well as profiling the server.

tkurki avatar Apr 14 '24 09:04 tkurki

@preeve9534 Can this be closed based on the release of patches to address this issue?

panaaj avatar May 18 '24 07:05 panaaj

Closing this. I have the patches installed and will reopen if I see the issue again.

preeve9534 avatar May 18 '24 10:05 preeve9534