OvenMediaEngine
OvenMediaEngine copied to clipboard
OVT Streamworker can get stuck, eating all system memory
An Edge for unknown reasons has reconnected its OVT connection to the Origin:
[2024-04-01 00:03:11.996] W [SPOvtPub-t9000:957880] Socket.Pool.Worker | socket_pool_worker.cpp:218 | [#16] [0x7fc9020c9b10] Failed to send data for 10000ms - This socket is going to be garbage collected (<ClientSocket: 0x7fc8f7550010, #27, Connected, TCP, Nonblocking, 136.243.153.227:43788>)
[2024-04-01 00:03:11.996] W [SPOvtPub-t9000:957880] Socket | socket.cpp:2109 | [#27] [0x7fc8f7550010] Socket was closed even though there was 4244 commands in the socket
[2024-04-01 00:03:17.009] I [SPOvtPub-t9000:957880] OVT | ovt_publisher.cpp:181 | OvtProvider is connected : <ClientSocket: 0x7fc8f7401c10, #34, Connected, TCP, Nonblocking, 142.132.205.123:55476>
[2024-04-01 00:03:17.024] I [SPOvtPub-t9000:957880] Monitor | stream_metrics.cpp:114 | A new session has started playing #default#origin1/revision-main on the OVT publisher. OVT(16)/Stream total(16)/App total(16)
Due to this, the OVT queue then got stuck and started overflowing:
[2024-04-01 00:03:17.674] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 2281, threshold: 500, peak: 2281
[2024-04-01 00:03:22.681] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 4306, threshold: 500, peak: 4306
[2024-04-01 00:03:27.696] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 6341, threshold: 500, peak: 6341
[2024-04-01 00:03:32.708] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 8399, threshold: 500, peak: 8399
[2024-04-01 00:03:37.719] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 10445, threshold: 500, peak: 10445
[2024-04-01 00:03:42.738] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 12491, threshold: 500, peak: 12491
[2024-04-01 00:03:47.769] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 14544, threshold: 500, peak: 14544
[2024-04-01 00:03:52.781] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 16591, threshold: 500, peak: 16591
[2024-04-01 00:03:57.797] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 18629, threshold: 500, peak: 18629
[2024-04-01 00:04:02.807] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 20673, threshold: 500, peak: 20673
[2024-04-01 00:04:07.828] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 22685, threshold: 500, peak: 22685
[2024-04-01 00:04:12.842] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 24691, threshold: 500, peak: 24691
[2024-04-01 00:04:17.852] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 26671, threshold: 500, peak: 26671
[2024-04-01 00:04:22.863] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 28650, threshold: 500, peak: 28650
...it continued to do so for a full hour until all 16GB system memory had been used up, and the Linux kernel OOM killer killed the OME process:
[2024-04-01 01:01:03.903] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 1422024, threshold: 500, peak: 1422024
[2024-04-01 01:01:08.921] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 1424007, threshold: 500, peak: 1424007
[2024-04-01 01:17:46.146] W [AW-OVT0:957902] ManagedQueue | managed_queue.h:444 | [122] mngq:v=#default#origin1:s=revision-main:p=pub:n=streamworker_ovt size has exceeded the threshold: queue: 1425251, threshold: 500, peak: 1425251
There may be something buggy with the socket error handling here.
For the OVT streamworker, SetPreventExceedThreshold is not set, which I find dangerous. But in general I think starting to drop messages would not be the right way to respond to some bug in other parts of the code. I think there should be some error handling in managed queue class for the case where a queue is getting stuck, in this case killing the stuck streamworker thread.
This appears to be a bug where the OVT Pulbisher gets stuck in a deadlock. Could you please post the full log containing the problem? If you do not want your logs to be made public, please send them to [email protected]. thank you
Sent.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.