jicofo icon indicating copy to clipboard operation
jicofo copied to clipboard

Overloading bridges when they report stats too slow

Open awlx opened this issue 5 years ago • 2 comments

Description

When a bridges becomes overloaded and it reports the stats too slow or infrequent it gets even more conferences dispatched.

Current behavior

When stats are reported too slow but health checks work it seems the videobridge gets all the conferences assigned.

Expected Behavior

No stats should mean something is wrong and we shouldn't dispatch anymore conferences until we get a report. Or mark it as failed after a grace period.

Possible Solution

Introduce a graceperiod for no stats and don't dispatch any new conferences

Steps to reproduce

Overload a bridge with slightly enough users and jicofo will start to overload the bridge completely

awlx avatar Apr 30 '20 08:04 awlx

Here you can see that overloading the machine happened.

Screenshot 2020-04-30 at 10 58 20 Screenshot 2020-04-30 at 10 58 09

Screenshot 2020-04-30 at 10 59 45

awlx avatar Apr 30 '20 09:04 awlx

More infos from Bridge log:

2020-04-30 04:35:45.471 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:35:46.682 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:35:47.880 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:35:49.127 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:35:50.385 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:36:51.668 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:36:54.002 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:37:57.890 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:38:00.063 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:38:02.075 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:39:03.594 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:39:05.065 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:03.806 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:05.422 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:06.636 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:07.738 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:08.985 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:10.091 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:11.771 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:27.548 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:29.919 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:34.494 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:40.771 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:40:41.434 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:41:02.252 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:41:03.326 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:41:04.552 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:41:05.565 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode
2020-04-30 04:41:06.575 SEVERE: [46] PubSubPublisher$3.run#495: Timed out a publish request: bridgeStatsNode

awlx avatar Apr 30 '20 09:04 awlx

We've fixed related issues since. Please re-open if it's still an issue.

bgrozev avatar Jan 05 '23 00:01 bgrozev