moonlight-qt Moonlight L4T Video decode unit queue overflow

Describe the bug When applying resolutions higher than 1440p video decode latency jumps from ~12 to ~150ms.

Steps to reproduce Apply 4k resolution on Nvidia Jetson TX2 and Xavier NX devices.

Client PC details (please complete the following information)

OS: Ubuntu 18.04
Moonlight Version: v3.1.0
Nvidia Jetson TX2
Nvidia Xavier NX

Devices that work with 4k

Intel laptop with UHD 630, ubuntu 18.04
Samsung galaxy note 8
Phone with Snapdragon 865

Additional context On the newer version of jetpack both devices trow segmentation fault trying to launch moonlight right after these lines of code

NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261

In resolutions that work decode latency never goes under 10ms, compared to 2ms Jetson Nano from other Moonlight users

TDLR

Want to know where to search for the problem.

Mar 16 '21 11:03 janis8008

Same problem happens when I try Higher resolution, but with low bitrate (4k 20 Mbps)

For example 1440p 80 Mbps works great with network usage sitting in 60 to 80 Mbps range.

Mar 16 '21 13:03 janis8008

Confirmed on my Jetson Nano. I will need to bisect to find the regression.

Mar 21 '21 19:03 cgutman

This doesn't actually look like a Moonlight regression. It appears something has changed with the L4T kernel or system libraries that is causing the performance reduction, because this worked before with identical Moonlight software.

I tried going back to v3.0.0 or even v2.2.0 and the performance is still bad at 4K. You can try yourself using apt install moonlight-qt=3.0.0-1 or apt install moonlight-qt=2.2.0-1 to revert to older packages.

The segfaults with newer Jetpack versions is probably due to Nvidia making breaking changes to their system libraries. I have clue why they would do that, but it prevents a single build of Moonlight from supporting pre-4.3 and post-4.3 builds.

Mar 28 '21 20:03 cgutman

I think I maybe running into this issue too. I'm using a Jetson Nano 2GB running Jetpack 4.4.1 and the latest Moonlight Qt. Anytime I try to push any higher than 60 fps @ 1080p, I start getting decode unit overflow in the terminal. Even at 60fps my decode time is around 15ms, which is what my 2nd gen firestick runs. I tried jetpack 4.2 which is the earliest version that says will support the nano, and the performance is exactly the same. Theoretically, shouldn't this board be able to handle 1080p @ 120fps? Is there a known working version that will allow 1080@120fps?

An interesting note is that forcing software decoding has much better performance until too much stuff on screen starts changing. I tried recompiling Moonlight using Jetson ffmpeg, but I couldn't get it to work due to some differences between the standard ffmpeg and Jetson ffmpeg.

Sep 18 '21 19:09 Koolguy007

My solution was to use my own hardware decoder, which I wrote using jetson decode example. I could get all the performane that was advertised from nvidia side

Sep 19 '21 06:09 janis8008

the implementation for jetson ffmpeg seemed to be fine, so decode unit handling on moonlight part might be a problem

Sep 19 '21 06:09 janis8008

Mind sharing some details? Specifically, what files did you modify and what example did you base it off of? I'm willing to give it a shot, but its been years since I've touched anything other than python.

Edit: I just found the example from nvidia. NvVideoDecoder, correct?

Sep 19 '21 06:09 Koolguy007

https://docs.nvidia.com/jetson/l4t-multimedia/l4t_mm_00_video_decode.html This example, and for code, you need tof ind place where decode units are sent to ffmpeg decoder

Sep 19 '21 07:09 janis8008

The long term approach is to get rid of the special Tegra-specific Moonlight packages by using a standard interface like https://github.com/cyndis/vaapi-tegra-driver and shipping a single arm64 package for every device.

Performance will be better with VAAPI too, since it can avoid extra copies needed by the nvmpi+SDL2 backend by mapping the decoded frame as a texture and rendering it via GLES. Moonlight already has this code today, and it should "just work" when L4T has a suitable VAAPI driver.

Sep 19 '21 20:09 cgutman

Got a short term solution? I just spent a day failing to even get Moonlight Qt to build from source (lots of errors from ffmpeg, of which I am using v4.2). I managed to build the embedded version, but my skill level isn't high enough in c and c++ to try and adapt a decoder to either of them.

Edit: figured out my compile issue. Turns out my Qt SDK messed up while installing. Gonna give modifying Qt again.

Sep 20 '21 05:09 Koolguy007

@janis8008 Is there any way you can make your decoder available? I've tried to wrap my head around everything to get this working, but I was hoping decode example would be more cut, paste, and pipe everything in.

Edit: just a note to anybody that happens upon this thread while searching. The above VAAPI driver, in it's current state, doesn't seem to play nice with Moonlight. I tried it for giggles, and the test decode fails. That maybe my fault though, I don't know.

Sep 26 '21 00:09 Koolguy007

You may also try building Moonlight with the new official Nvidia ffmpeg package that comes with recent versions of Jetpack rather than the third-party jetson-ffmpeg library.

I'm not sure how Nvidia has implemented the new decoders and Moonlight may need some minor changes to use them. If you post output from ffmpeg -decoders and ffmpeg -hwaccels, I can hopefully sort that out quickly.

https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/multimedia.html#wwpID0EQHA

Sep 26 '21 16:09 cgutman

So, when I try to build using the official Nvidia ffmpeg, I get

/home/koolguy007/moonlight-qt/app/streaming/video/ffmpeg.h:44:38: error: ‘AVCodecHWConfig’ does not name a type; did you mean ‘AVCodecContext’?

I don't know why exactly. avcodec.h from the 3rd party ffmpeg and avcodec.h from Nvidia are identical according to the diff command. As for the outputs, I'll attach them. HWAccelsOutput.txt

DecodersOutput.txt

Sep 26 '21 20:09 Koolguy007

Make sure it's using the proper FFmpeg headers. I think that's the error you'd get if it was using the headers that come with the stock libavcodec-dev on Ubuntu 18.04.

It looks like Nvidia is using a new decoder rather than a hwaccel. When you run Moonlight, you'll need to run with some environment variables set: H264_DECODER_HINT=h264_nvv4l2dec and HEVC_DECODER_HINT=hevc_nvv4l2dec

Sep 26 '21 20:09 cgutman

So, I managed to get past that point in compiling, but I've hit another roadblock. So far I've compiled Nvidia's ffmpeg from source and installed it without installing Ubuntu's libavcodec-dev. Now I'm running into this:

/usr/local/lib/libavcodec.so: undefined reference to 'v4l2_close' /usr/local/lib/libavcodec.so: undefined reference to 'v4l2_ioctl' /usr/local/lib/libavcodec.so: undefined reference to 'v4l2_open' collect2: error: ld returned 1 exit status

Google has pointed me to adding -lv4l1 -lv4l2 to the compiler flags, but I tried adding QMAKE_CXXFLAGS += -lv4l1 -lv4l2 in app.pro. I haven't seen the flags appear in the compiler output, so I guess I'm not quite sure where to add it.

Oct 05 '21 06:10 Koolguy007

Hmm, seems like you might need that on the FFmpeg build itself? Libavcodec.so should have been linked to all the libraries it needs when it was compiled.

Oct 06 '21 01:10 cgutman

The ffmpeg build "seemed" to compile fine. I got the -lv4l1 -lv4l2 flags working in Qt Creator, I can see them in the compile output right before the error, but no change to error status. I'm getting a dreadful feeling that something is amiss between libv4l-dev and Nvidia's ffmpeg. I'll try recompiling ffmpeg again and watch the output closely, but I would expect the compile to fail if all libraries were not ok.

Oct 11 '21 05:10 Koolguy007

Hi,

On my side, to get it work with the official Nvidia ffmpeg, I had to add -lv4l2 at the end of the line starting by LIBS = in the Makefile.Debug or Makefile.Release in app directory. Passing the option using qmake didn't help, although I see -lv4l2 in the makefile files.

For info, I didn't compile ffmpeg, just used the one provided by nvidia for Jetson.

Basically:

with qmake "LIBS+=-lv4l2" in app/Makefile.Debug, I see: LIBS = $(SUBLIBS) -lv4l2 -ldl -L(...) -lEGL --> it doesn't work undefined reference to 'v4l2_close'

with manual modification in app/Makefile.Debug LIBS = $(SUBLIBS) -ldl -L(...) -lEGL -lv4l2 --> it works and moonlight is compiled

I'm not familiar with Qt and I don't know how the Makefile.Debug/Release are generated and I cannot really say why it works with the second option?! any idea?

Jan 11 '22 15:01 KJPatience

My guess is that it's linking to static libraries and their libavcodec.pc doesn't include the appropriate -lv4l2 flag

Jan 12 '22 01:01 cgutman

Nvidia recently updated their ffmpeg decoder, haven't had time to test it yet, maybe they fixed all problems.

On Wed, Jan 12, 2022, 3:38 AM Cameron Gutman @.***> wrote:

My guess is that it's linking to static libraries and their libavcodec.pc doesn't include the appropriate -lv4l2 flag

— Reply to this email directly, view it on GitHub https://github.com/moonlight-stream/moonlight-qt/issues/546#issuecomment-1010543459, or unsubscribe https://github.com/notifications/unsubscribe-auth/AL6HSHDSUDIE6HZ2PDNUCD3UVTLSPANCNFSM4ZIKFE4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

Jan 12 '22 08:01 janis8008

moonlight-qt moonlight-qt copied to clipboard

Moonlight L4T Video decode unit queue overflow

moonlight-qt
moonlight-qt copied to clipboard