companion icon indicating copy to clipboard operation
companion copied to clipboard

[BUG] Companion Web Interface uses infinite bandwidth

Open amcchord opened this issue 3 years ago • 16 comments

Is this a bug in companion itself or a module?

  • [X] I believe this to be a bug in companion

Is there an existing issue for this?

  • [X] I have searched the existing issues

Describe the bug

When loading the companion Web UI on a machine that is not the companion server I can the WebUI uses as much bandwidth as it can possibly capture. This can be well upwards of 10 megabytes per second. I cannot seem to see these requests in the network tab of the inspector in chrome. But beyond that not sure how to track this down. Below is a video so you can see the bandwidth usage in action. I have tested this on the past few betas as well as the latest. The screen recording is not from the latest, but actions are the same on the latest beta.

https://user-images.githubusercontent.com/9466882/165615950-4aacd406-fd60-483b-8061-3040e301a52f.mp4

Steps To Reproduce

No response

Expected Behavior

No response

Environment (please complete the following information)

- OS: Mac OS 12.3 (Arm)
- Browser: Chrome
- Companion Version: 2.2.0 (2.2.0+4118-beta-6acb1059)

Additional context

No response

amcchord avatar Apr 27 '22 19:04 amcchord

This has been seen before, and is generally caused by a module updating its list of actions or feedbacks way too often. There was a suggestion the other day to do some diffing and send less data each time a module does, but we should still figure out which modules are being bad.

I have pushed a commit to the betas, that will enable some additional logging when the page is loaded with a suffix to the url. eg http://localhost:3000/buttons#debug_socket It will then log everything received: image Could you send a video of the console showing these messages? I suspect it will show ids that will need translating back to modules, Im not sure the best way of compiling that data other than looking at the db file, or might be guessable by examining the deep contents of some logged lines

Julusian avatar Apr 27 '22 20:04 Julusian

Console Logs and Screenshot.

Screen Shot 2022-04-27 at 4 48 12 PM

192.168.10.10-1651092502299.log 192.168.10.10-1651092536723.log

amcchord avatar Apr 27 '22 20:04 amcchord

From the logs I can see:

  • Approx every second 'Execute CageXState' fires. This causing a schedule_refresh to be emitted, and also causes the bitfocus-companion module to update its actions and feedbacks (because it lists them in some actions and feedbacks)
  • Every 5 seconds, each of the Cage x Power Strip update their actions, feedbacks, presets and variable definitions. The variables is causing bitfocus-companion to update its feedbacks.

I think there are 4 things we should do here to improve this:

  • [x] Do some more optimised events when a scheduled job executes. It's only wanting to update last_run, but ends up doing a lot more
  • [ ] Fix the digitalloggers-powercontroller module to only update its stuff when something has changed. @josephdadams are you able to do that or should I look into it
  • [x] Backport some changes I was playing with in 2.3, which would minimise/avoid the need for the internal module to update its actions and feedbacks.
  • [x] As Håkon suggested, look into sending a json diff to the ui. This will minimise the bandwidth used by 'bad' modules at the cost of a bit more cpu in companion

Julusian avatar Apr 27 '22 21:04 Julusian

Let me look into the module to see if there’s something I can improve

josephdadams avatar Apr 27 '22 22:04 josephdadams

Ultimately I think json Diff will help the most. My guess is that the added companion CPU load would be relatively small. Also serving a web session with 100 megabits of data per second is likely non trivial CPU on the companion server side as well. The Diff may actually be less CPU in my case.

amcchord avatar Apr 27 '22 22:04 amcchord

For those 2 cases I found, almost all the data being sent is unnecessary and can be concluded to be unnecessary much cheaper than doing a full diff. But I would still like to look into doing the diffing as it will be a good safety net to avoid this cropping up again, even though it will make it much harder to notice that a module is updating more often that it should. It will likely also help reduce the size of the changes sent to clients which will be good as often the changes are small

Julusian avatar Apr 27 '22 22:04 Julusian

Its hard to imagine though just those modules are driving it.. When I think about the fact the page is eating 4 to 10 megabytes of json per second.. that's just extreme. Thats a lot of json.

amcchord avatar Apr 27 '22 23:04 amcchord

@Julusian I did some testing... Disabling the Digital Loggers and the Clock modules had no real effect on the data usage. Turning off the 2 second triggers labeled CageXState did stop the traffic. It looks like whenever a trigger fires it causes all new JSON to be sent. The 9 second trigger causes a few megabytes of traffic every time it fires. That trigger doesn't event effect any variables.

amcchord avatar Apr 28 '22 15:04 amcchord

@amcchord could you update and try again? https://github.com/bitfocus/companion/commit/aa6596daa0c98e787dd52f644a69e3a5e3de3f5a should fix the part of the issue caused by triggers firing Also interesting that is is a few mb of traffic, from what I can tell from you db it should be a couple of kb of traffic, Ill try and dig deeper into what it actually being sent over the wire from the triggers. It is possible it is accidentally sending some objects it shouldnt

Julusian avatar Apr 28 '22 16:04 Julusian

#aa659d appears to solve the issue completely. Thanks!

amcchord avatar Apr 28 '22 19:04 amcchord

That is good to hear, but is interesting as I can only account for it using in the range of 20KB/s. But I can live with not knowing why it was showing weirdly

I still think it is worth looking at the other bits. Not sure if it is worth pushing for them to be done in 2.2.1, or they might be better left for 2.2.2

Julusian avatar Apr 28 '22 19:04 Julusian

Your fix solves it for triggers.. but if I turn on the digital loggers web power switch module again.. the problem re-appears and its in the 5 to 10 megabytes per second range

amcchord avatar Apr 28 '22 20:04 amcchord

I made a new recording. I used safari this time as its better at tracking the usage for some reason. Chrome causes about 2x the data usage (weird). But when I turn on the offending modules it means about 2 megabytes a second of transfer. @josephdadams has agreed to take a look and fix the modules... but if you are looking for more info.. maybe this helps.

Here is a screen capture (Sped up 2x to keep the file small)

https://user-images.githubusercontent.com/9466882/165844377-36a9de4c-1723-4f3e-bfcd-63b87b7bb6ee.mp4

amcchord avatar Apr 28 '22 20:04 amcchord

I pushed some other changes that will help reduce the amount of data a little, but only by I would guess at most half. Once the module is fixed it should go away then.

But it does sound like I should look into the json diffing too

Julusian avatar Apr 28 '22 21:04 Julusian

I'm seeing this issue too. The first release of v2.1 seemed to be ok.It's a fairly recent issue.

G8YTZ avatar Jul 17 '22 16:07 G8YTZ

@amcchord in 2.3 there is proper json diffing being done, so I hope that this has gone away now. Could you confirm if it is better?

Julusian avatar Sep 03 '22 13:09 Julusian