Ant-Media-Server
Ant-Media-Server copied to clipboard
Inconsistent use of Tesla T4 GPU causes fluctuating broadcasting
Enterprise Edition 2.7.0 20231031_0626
Continuation of https://github.com/ant-media/Ant-Media-Server/issues/5590
Short description
I have 8 inbound rtmp streams (1080p) and 3 transcoding renditions (default-480,default-720,default-1080), none of them can sustain broadcasting and flip wildly from Broadcasting 0.01x
up to Broadcasting 100x
none of them can support playback for longer then a few seconds.
Environment
- Ubuntu 20.04.6 LTS
- Java version: build 11.0.20.1+1-post-Ubuntu-0ubuntu120.04
- Ant Media Server version:
Enterprise Edition 2.7.0 20231031_0626
- Browser name and version: N/A
Steps to reproduce
- Install 2.7.0 on g4dn.12xlarge
- have 8 rtmp 1080 30fps @ about 4-8 mbps each being transcoded to 3 transcoding renditions (default-480,default-720,default-1080)
- check
nvidia-smi
output
Expected behavior
Same performance as 2.4.3 (which is able to handle the exact same camera sources AND MORE and rendition same count/type) All 4 gpu's are utilized on 2.4.3 and 2.4.3 can keep up all 25 streams at 99 to 101 percent broadcast status.
Actual behavior
Only a fraction of the streams can keep up and nvidia-smi
shows the following...
Mon Nov 20 17:54:16 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 40C P0 47W / 70W | 6843MiB / 15360MiB | 54% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 29C P0 26W / 70W | 891MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 29C P0 26W / 70W | 577MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 29C P0 26W / 70W | 969MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3978 C ...11-openjdk-amd64/bin/java 6821MiB |
| 1 N/A N/A 3978 C ...11-openjdk-amd64/bin/java 882MiB |
| 2 N/A N/A 3978 C ...11-openjdk-amd64/bin/java 568MiB |
| 3 N/A N/A 3978 C ...11-openjdk-amd64/bin/java 960MiB |
+-----------------------------------------------------------------------------+
if you keep issuing nvidia-smi
you'll eventually see that other GPU's get activated and then dropped (notice how GPU 2 is 14%
. This is different from https://github.com/ant-media/Ant-Media-Server/issues/5590 where
Mon Nov 20 17:40:01 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 36C P0 48W / 70W | 7728MiB / 15360MiB | 64% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 27C P0 25W / 70W | 1185MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 27C P0 26W / 70W | 693MiB / 15360MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 28C P0 26W / 70W | 901MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
so this is different then
Logs
will send to support upon request.
Lots of these in the logs...
2023-11-20 18:02:06,145 [vertx-blocked-thread-checker] WARN i.v.core.impl.BlockedThreadChecker - Thread Thread[vert.x-worker-thread-34,5,main] has been blocked for 57892 ms, time limit is 10000 ms
io.vertx.core.VertxException: Thread blocked
at org.bytedeco.ffmpeg.global.avfilter.avfilter_graph_free(Native Method)
at io.antmedia.enterprise.adaptive.video.H264Encoder.freeFilterResources(H264Encoder.java:870)
at io.antmedia.enterprise.adaptive.video.H264Encoder.freeEncoderRelatedResources(H264Encoder.java:861)
at io.antmedia.enterprise.adaptive.base.VideoEncoder.writeTrailer(VideoEncoder.java:397)
at io.antmedia.enterprise.adaptive.video.H264Encoder.writeTrailer(H264Encoder.java:695)
at io.antmedia.enterprise.adaptive.StreamAdaptor.writeEncodeTrailers(StreamAdaptor.java:428)
at io.antmedia.enterprise.adaptive.StreamAdaptor.execute(StreamAdaptor.java:276)
at io.antmedia.enterprise.adaptive.StreamAdaptor.lambda$start$0(StreamAdaptor.java:182)
at io.antmedia.enterprise.adaptive.StreamAdaptor$$Lambda$510/0x0000000800802840.handle(Unknown Source)
at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:159)
at io.vertx.core.impl.ContextImpl$$Lambda$404/0x00000008005c6440.handle(Unknown Source)
at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:157)
at io.vertx.core.impl.ContextImpl$$Lambda$401/0x00000008005c7440.run(Unknown Source)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at [email protected]/java.lang.Thread.run(Thread.java:829)
2023-11-20 18:02:06,146 [vertx-blocked-thread-checker] WARN i.v.core.impl.BlockedThreadChecker - Thread Thread[vert.x-worker-thread-100,5,main] has been blocked for 571085 ms, time limit is 10000 ms
io.vertx.core.VertxException: Thread blocked
at org.bytedeco.ffmpeg.global.avcodec.avcodec_send_frame(Native Method)
at io.antmedia.enterprise.adaptive.video.H264Encoder.avCodecSendFrame(H264Encoder.java:729)
at io.antmedia.enterprise.adaptive.video.H264Encoder.sendPacket2Encoder(H264Encoder.java:713)
at io.antmedia.enterprise.adaptive.video.H264Encoder.writeFrameInternal(H264Encoder.java:208)
at io.antmedia.enterprise.adaptive.base.VideoEncoder.writeFrame(VideoEncoder.java:275)
at io.antmedia.enterprise.adaptive.StreamAdaptor.write2VideoEncoders(StreamAdaptor.java:347)
at io.antmedia.enterprise.adaptive.StreamAdaptor.execute(StreamAdaptor.java:228)
at io.antmedia.enterprise.adaptive.StreamAdaptor.lambda$start$0(StreamAdaptor.java:182)
at io.antmedia.enterprise.adaptive.StreamAdaptor$$Lambda$510/0x0000000800802840.handle(Unknown Source)
at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:159)
at io.vertx.core.impl.ContextImpl$$Lambda$404/0x00000008005c6440.handle(Unknown Source)
at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:157)
at io.vertx.core.impl.ContextImpl$$Lambda$401/0x00000008005c7440.run(Unknown Source)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at [email protected]/java.lang.Thread.run(Thread.java:829)
2023-11-20 18:02:06,146 [vertx-blocked-thread-checker] WARN i.v.core.impl.BlockedThreadChecker - Thread Thread[vert.x-worker-thread-78,5,main] has been blocked for 591002 ms, time limit is 10000 ms
io.vertx.core.VertxException: Thread blocked
at org.bytedeco.ffmpeg.global.avcodec.avcodec_send_frame(Native Method)
at io.antmedia.enterprise.adaptive.video.H264Encoder.avCodecSendFrame(H264Encoder.java:729)
at io.antmedia.enterprise.adaptive.video.H264Encoder.sendPacket2Encoder(H264Encoder.java:713)
at io.antmedia.enterprise.adaptive.video.H264Encoder.writeFrameInternal(H264Encoder.java:208)
at io.antmedia.enterprise.adaptive.base.VideoEncoder.writeFrame(VideoEncoder.java:275)
at io.antmedia.enterprise.adaptive.StreamAdaptor.write2VideoEncoders(StreamAdaptor.java:347)
at io.antmedia.enterprise.adaptive.StreamAdaptor.execute(StreamAdaptor.java:228)
at io.antmedia.enterprise.adaptive.StreamAdaptor.lambda$start$0(StreamAdaptor.java:182)
at io.antmedia.enterprise.adaptive.StreamAdaptor$$Lambda$510/0x0000000800802840.handle(Unknown Source)
at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:159)
at io.vertx.core.impl.ContextImpl$$Lambda$404/0x00000008005c6440.handle(Unknown Source)
at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:157)
at io.vertx.core.impl.ContextImpl$$Lambda$401/0x00000008005c7440.run(Unknown Source)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at [email protected]/java.lang.Thread.run(Thread.java:829)
2023-11-20 18:02:06,223 [Thread-347] INFO i.a.e.adaptive.StreamAdaptor - Queue size(2001) is exceeding 2000 so dropping frame for stream: REDACTEDCAMERANAME
As an experimenting I'm deactivating all the streams. Then one at a time starting them up with a couple minutes between each start up. Results...
- one stream, stable (6% of 1 gpu of 4 available )
- two streams, stable (13% of 1 gpu of 4 available )
- three streams, stable (20% of 1 gpu of 4 available )
- four streams, stable (27% of 1 gpu of 4 available )
- five streams, stable (34% of 1 gpu of 4 available )
- six streams, stable (40% of 1 gpu of 4 available )
- seven streams, stable (47% of 1 gpu of 4 available )
- eight streams, stable (55% of 1 gpu of 4 available )
- nine streams, stable (56% of 1 gpu of 4 available )
no change
- ten streams, stable (56% of 1 gpu of 4 available )
no change
- eleven streams, stable (68% of 1 gpu of 4 available )
- twelve streams, stable (68% of 1 gpu of 4 available )
no change
- thriteen streams, stable (70% of 1 gpu of 4 available )
- fourteen streams, stable (64-90% of 1 gpu of 4 available )
oscilates
- fifteen streams, stable (63-70% of 1 gpu of 4 available )
oscilates
- sixteen streams, stable (63-70% of 1 gpu of 4 available )
oscilates
- seveteen streams, stable (63-100% of 1 gpu of 4 available )
oscilates
and briefly GPU 2 lit up with 10% then down.
At the moment these are all stable. When I run nvidia-smi
I see this
Mon Nov 20 18:41:22 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 41C P0 50W / 70W | 11915MiB / 15360MiB | 67% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 28C P0 25W / 70W | 1283MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 29C P0 26W / 70W | 1185MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 29C P0 26W / 70W | 1087MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 11887MiB |
| 1 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 1274MiB |
| 2 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 1176MiB |
| 3 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 1078MiB |
+-----------------------------------------------------------------------------+
After that experiment, I tried resetting all the streams (stop/start)
and now we are back in the original failure state shown in the main issue opening.
all the channels are flopping around at 0.01 x
and the logs are now full of
2023-11-20 18:51:44,521 [vertx-blocked-thread-checker] WARN i.v.core.impl.BlockedThreadChecker - Thread Thread[vert.x-worker-thread-109,5,main] has been blocked for 28292 ms, time limit is 10000 ms
io.vertx.core.VertxException: Thread blocked
at org.bytedeco.ffmpeg.global.avcodec.avcodec_send_frame(Native Method)
at io.antmedia.enterprise.adaptive.video.H264Encoder.avCodecSendFrame(H264Encoder.java:729)
at io.antmedia.enterprise.adaptive.video.H264Encoder.sendPacket2Encoder(H264Encoder.java:713)
at io.antmedia.enterprise.adaptive.video.H264Encoder.writeFrameInternal(H264Encoder.java:208)
at io.antmedia.enterprise.adaptive.base.VideoEncoder.writeFrame(VideoEncoder.java:275)
at io.antmedia.enterprise.adaptive.StreamAdaptor.write2VideoEncoders(StreamAdaptor.java:347)
at io.antmedia.enterprise.adaptive.StreamAdaptor.execute(StreamAdaptor.java:228)
at io.antmedia.enterprise.adaptive.StreamAdaptor.lambda$start$0(StreamAdaptor.java:182)
at io.antmedia.enterprise.adaptive.StreamAdaptor$$Lambda$509/0x0000000800808840.handle(Unknown Source)
at io.vertx.core.impl.ContextImpl.lambda$null$0(ContextImpl.java:159)
at io.vertx.core.impl.ContextImpl$$Lambda$404/0x00000008005c6440.handle(Unknown Source)
at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100)
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:157)
at io.vertx.core.impl.ContextImpl$$Lambda$401/0x00000008005c7440.run(Unknown Source)
at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at [email protected]/java.lang.Thread.run(Thread.java:829)
and the nvidia-smi
command output looks different now (notice the GPU Memory
)
Mon Nov 20 18:55:09 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 |
| N/A 41C P0 47W / 70W | 8972MiB / 15360MiB | 68% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 |
| N/A 29C P0 25W / 70W | 1087MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 |
| N/A 29C P0 26W / 70W | 499MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 29C P0 26W / 70W | 1383MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 8947MiB |
| 1 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 1078MiB |
| 2 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 490MiB |
| 3 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 1372MiB |
+-----------------------------------------------------------------------------+
QUOTE...
As an experimenting I'm deactivating all the streams. Then one at a time starting them up with a couple minutes between each start up. Results...
- one stream, stable (6% of 1 gpu of 4 available )
- two streams, stable (13% of 1 gpu of 4 available )
- three streams, stable (20% of 1 gpu of 4 available )
- four streams, stable (27% of 1 gpu of 4 available )
- five streams, stable (34% of 1 gpu of 4 available )
- six streams, stable (40% of 1 gpu of 4 available )
- seven streams, stable (47% of 1 gpu of 4 available )
- eight streams, stable (55% of 1 gpu of 4 available )
- nine streams, stable (56% of 1 gpu of 4 available )
no change
- ten streams, stable (56% of 1 gpu of 4 available )
no change
- eleven streams, stable (68% of 1 gpu of 4 available )
- twelve streams, stable (68% of 1 gpu of 4 available )
no change
- thriteen streams, stable (70% of 1 gpu of 4 available )
- fourteen streams, stable (64-90% of 1 gpu of 4 available )
oscilates
- fifteen streams, stable (63-70% of 1 gpu of 4 available )
oscilates
- sixteen streams, stable (63-70% of 1 gpu of 4 available )
oscilates
- seveteen streams, stable (63-100% of 1 gpu of 4 available )
oscilates
and briefly GPU 2 lit up with 10% then down.At the moment these are all stable. When I run
nvidia-smi
I see thisMon Nov 20 18:41:22 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | | N/A 41C P0 50W / 70W | 11915MiB / 15360MiB | 67% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | | N/A 28C P0 25W / 70W | 1283MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | | N/A 29C P0 26W / 70W | 1185MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 29C P0 26W / 70W | 1087MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 11887MiB | | 1 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 1274MiB | | 2 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 1176MiB | | 3 N/A N/A 26084 C ...11-openjdk-amd64/bin/java 1078MiB | +-----------------------------------------------------------------------------+
I was about to create another ticket about problems with GPU usage, but it looks similar to this one.
We have recently upgraded our dev server from 2.6.3 to 2.7.0 in order to run some tests, hoping that GPU issues were fixed. Our initial configuration:
- OS: Ubuntu 22.04 LTS
- AntMedia: 2.7.0 Enterprise Edition
- CPU: 4 cores (later increased to 8)
- GPU: Quadro RTX 4000
- RAM: 16 GB (later increased to 64GB)
Test setup:
- ABR with 4 resolutions (2160, 1080, 720, 240)
- test stream source: IP camera 3840x2160, delivered over rtmp
- after the initial test RAM was increased to 64GB, then we assigned 4 additional CPU cores
Results:
+---------+-----------+-----------+-----------+------------------------------------------------------------+ | Streams | CPU usage | GPU usage | RAM usage | Notes | +---------+-----------+-----------+-----------+------------------------------------------------------------+ | 4 | ~200% |Notice that neither CPU, GPU nor RAM usage is high enough to cause problems after the increase of resources.
Previously we had AntMedia 2.4.3 and problems only started to arise with 8th or 9th stream. We started to see the problem after upgrading to 2.6.3 (forced b/c of ugrading Ubuntu to 22), after which we had to turn off ABR completely.
Hi Guys,
I've put it to the backlog with high priority. It's likely that we schedule it soon.
FYI
Hello, I've looked into this matter and i've found a solution. Once it's merged, I'll provide an update. If you need it urgently, please let me know.
@lastpeony @burak-58 any update? Do you think this gets fixed in 2.9.x? (I see 2.9.0 was released recently).
Hi @alfred-stokespace,
I remember that we've fixed some issues related to this one.
@burak-58 , could you please update us?
Regards Oguz
Hello @alfred-stokespace I performed some tests. Please find the details here: https://github.com/ant-media/Ant-Media-Server/issues/6389#issuecomment-2186294546 I think you can upgrade to 2.9.0 and try with hwScalingEnabled=false
@lastpeony finally getting into a realistic test. Had some trouble with Nvidia drivers all-of-a-sudden and had to build new servers off 22.04 w/AMS suggested nvidia drivers. But after that quagmire I'm seeing the following ...
Config change ...
grep Scaling /usr/local/antmedia/webapps/LiveApp/WEB-INF/red5-web.properties
shows...
settings.encoding.hwScalingEnabled=false
Again, this is Ubuntu 22.04
nvidia-smi
Tue Jun 25 20:17:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:1B.0 Off | 0 |
| N/A 40C P0 28W / 70W | 1923MiB / 15360MiB | 16% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:1C.0 Off | 0 |
| N/A 40C P0 27W / 70W | 672MiB / 15360MiB | 6% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla T4 Off | 00000000:00:1D.0 Off | 0 |
| N/A 39C P0 26W / 70W | 555MiB / 15360MiB | 5% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 39C P0 27W / 70W | 530MiB / 15360MiB | 4% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 9803 C .../jvm/java-17-openjdk-amd64/bin/java 1916MiB |
| 1 N/A N/A 9803 C .../jvm/java-17-openjdk-amd64/bin/java 668MiB |
| 2 N/A N/A 9803 C .../jvm/java-17-openjdk-amd64/bin/java 550MiB |
| 3 N/A N/A 9803 C .../jvm/java-17-openjdk-amd64/bin/java 526MiB |
+-----------------------------------------------------------------------------------------+
that's with six streams. So far so good. This looks like what I would expect from the 2.4.3 instance (still running like a champe btw!).
and now ramping up to 16 streams (3 renditions per-stream, same streams as noted earlier in the ticket and same rendition details, same cameras)
nvidia-smi
Tue Jun 25 20:23:30 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:1B.0 Off | 0 |
| N/A 44C P0 40W / 70W | 5214MiB / 15360MiB | 41% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla T4 Off | 00000000:00:1C.0 Off | 0 |
| N/A 41C P0 31W / 70W | 1602MiB / 15360MiB | 12% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 Tesla T4 Off | 00000000:00:1D.0 Off | 0 |
| N/A 40C P0 30W / 70W | 1602MiB / 15360MiB | 12% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 39C P0 31W / 70W | 1602MiB / 15360MiB | 12% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 9803 C .../jvm/java-17-openjdk-amd64/bin/java 5201MiB |
| 1 N/A N/A 9803 C .../jvm/java-17-openjdk-amd64/bin/java 1592MiB |
| 2 N/A N/A 9803 C .../jvm/java-17-openjdk-amd64/bin/java 1592MiB |
| 3 N/A N/A 9803 C .../jvm/java-17-openjdk-amd64/bin/java 1592MiB |
+-----------------------------------------------------------------------------------------+
I also see that stream status "Broadcasting 1.00x" is pretty stable on all the streams (that was another sign of problems before, it would flucuate wildly) now it's flucuating between 0.99x and 1.01x which was common for 2.4.3 as well.
I dropped in a few streams with WebRTC player and didn't see any problems.
So,... this is looking really good at the moment.
Hi @alfred-stokespace,
Thank you for your thorough analysis and detailed bug report. It was instrumental in helping us identify and fix the issue.
I'm glad to hear that everything is working as expected on your end now. We're continuously working to improve GPU performance in future releases, while ensuring nothing else is broken :)
I'm closing this issue for now, but please feel free to reopen it if needed.