Sunshine icon indicating copy to clipboard operation
Sunshine copied to clipboard

Flickering of small lines and certain colors of text when casting with NvFBC

Open Jaymus3 opened this issue 3 years ago • 28 comments

Describe the Bug

When streaming, certain solid color lines on the screen and certain colors of text have a visible flickering effect sort of like scanlines on a CRT. Media attached as demonstration:

https://user-images.githubusercontent.com/1351081/167283488-26bb395b-6213-4e8c-801d-3f7fef9557b5.mp4

https://user-images.githubusercontent.com/1351081/167283492-e374efa8-4bf8-4c74-8fb0-29bdc717abc6.mp4

Expected Behavior

No visible flickering; picture closer to original

Additional Context

Flickering has been present on multiple builds of sunshine, but only when screencasting with NvFBC. When casting with X11, lines and text look normal. Stream information: CleanShot 2022-05-07 at 03 57 53

Sunshine Host Operating System and Version

Kubuntu 22.04

Architecture

64bit

Sunshine Version

0.13.0, also nightly build

GPU Type

NVIDIA

GPU Model

RTX 3070Ti

GPU Driver/Mesa Version

510.60.02

Capture Method (Linux Only)

X11, NvFBC

Jaymus3 avatar May 08 '22 05:05 Jaymus3

Just chiming in to add that I see this issue on my end too: Linux Mint 20.3, Nvidia GTX 1080 Same GPU driver version. Likewise Steam Link app streaming doesn't have this issue, I think because it also relies on X11 capture rather than NvFBC

entropicdrifter avatar May 09 '22 20:05 entropicdrifter

Replicated here on nightly in addition to stable; always assumed this was a result of scaling from 4k down to 1080.

Arch x11 Plasma GTX1070

Hominine avatar May 22 '22 08:05 Hominine

Same issue here on v0.14.0 AppImage when streaming with NvFBC.

Distro: Manjaro Plasma

Kernel: 5.15.46-1-MANJARO

GPU: Nvidia GTX 1070 Driver 515.48.07

Xupack88 avatar Jun 16 '22 10:06 Xupack88

As a follow-up to this, the copy I have built with NvFBC enabled still has this issue when NvFBC fails to start and it falls back to X11. I noticed this when updating my display driver before I patched it again. It seems as though just building a copy of sunshine with CUDA causes it. I can test more if requested.

Jaymus3 avatar Jun 16 '22 10:06 Jaymus3

Also experiencing this and also only started seeing it when I rebuilt with cuda

Hatsune-Cthulhu avatar Jun 22 '22 19:06 Hatsune-Cthulhu

Is there any fix for this? It is working on Ubuntu 22.04 but the flickering kills it for me. (Also I am not too much into building yet so I am not sure how to build the app without CUDA)

squallk avatar Jul 19 '22 13:07 squallk

image

These settings fixes the issue for me. However there is a new issue as you can see in the video. It affects Big Picture and some fullscreen games. Steam Link doesn't have this issue.

https://user-images.githubusercontent.com/09594907/180096406-b4931f0f-6d55-4ce8-8160-3079b821975b.mp4

Ubuntu 22.04 @ Gnome 42.2

squallk avatar Jul 20 '22 10:07 squallk

image

These settings fixes the issue for me. However there is a new issue as you can see in the video. It affects Big Picture and some fullscreen games. Steam Link doesn't have this issue. PXL_20220721_004629390.1.mp4

Ubuntu 22.04 @ Gnome 42.2

Those settings you've listed don't actually work, you're falling back to ffmpeg, which is why you're getting the new artifacting issue. I've mentioned both issues to the developers in the discord a while back, but the current settings won't be sticking around for long as nvidia are changing it all anyway

Hatsune-Cthulhu avatar Jul 28 '22 18:07 Hatsune-Cthulhu

I ran into this problem too. In search for a cause, I questioned why does NVFBC need to pass the image through the CUDA at all? My largely uneducated guess was that it is RGBA to NV12 conversion what introduces such artifacts, and I was able to rewrite Sunshine's NVFBC implementation without it and also without any CUDA usage (using NVFBC_TO_SYS interface type instead of NVFBC_SHARED_CUDA). And... on my setup it's works perfectly - with such changes, the flickering is gone! (While screencasting is done via NVFBC, and while NVENC is used as an encoder - that is verified by Sunshine's log messages and Moonlight's stats text).

See my dirty patch (works, but not ready for commit at all) here: https://gist.github.com/leenr/8c3cbff68e4489d39aa0769bd352dddb. Beware that in the patch, I also disabled the successful conversion check globally b/c it is currently giving false negatives at all times for NVFBC encoder. Also, while I was writing this message I noticed NVFBC_TO_HW_ENCODER interface type, which may be worth checking out (https://developer.nvidia.com/sites/default/files/akamai/designworks/docs/NVIDIA-Capture-SDK-FAQ.pdf, page 6, Q4 describes it as, in my understanding, that the frame will be also be compressed into H264/H265 stream directly).

At this moment I don't have a time and mental capacity and involved topics understanding to continue invest effort into polishing it into pull request, sorry :( But I hope that my findings will help someone make it.

BTW, this project is awesome! Thank you very much for all the contributors! :)

leenr avatar Sep 09 '22 03:09 leenr

Also, while I was writing this message I noticed NVFBC_TO_HW_ENCODER interface type, which may be worth checking out (https://developer.nvidia.com/sites/default/files/akamai/designworks/docs/NVIDIA-Capture-SDK-FAQ.pdf, page 6, Q4 describes it as, in my understanding, that the frame will be also be compressed into H264/H265 stream directly).

NVFBC_TO_HW_ENCODER (NVFBC_CAPTURE_TO_HW_ENCODER) is "retired", see https://github.com/LizardByte/Sunshine/blob/master/third-party/nvfbc/NvFBC.h#L157 and https://github.com/LizardByte/Sunshine/blob/master/third-party/nvfbc/NvFBC.h#L422

dec05eba avatar Sep 26 '22 01:09 dec05eba

What's the status of of this issue? I just compiled the latest version from source and I'm experiencing the same flickering artefacts when building with CUDA support enabled, it basically makes the stream unusable for desktop applications. If I disable CUDA in the build and fall back to software encoding the image quality is pretty good and there are no flickering artefacts, but performance is not great at 4K resolution.

Judging from the above comments the expectation is that this will be fixed in a later version and I could enable CUDA/NvEnc again, correct?

w0utert avatar Sep 30 '22 17:09 w0utert

Some more information to add to my last comment:

  • Using sunshine compiled from GitHub master or nightly (same behavior)
  • NVidia 515 driver on RTX 3080, CUDA 11.7
  • Host OS Ubuntu 22.04 with KDE DE
  • Moonlight-QT client on MacOS or iOS
  • Capture resolution 4K@60
  • X11 capture or NvFBC capture (enabled using driver patch) makes no difference

I get exactly the same artifacts as shown in the video’s posted by the original reporter. The desktop itself, bitmap graphics and any text on white background look ok, but colored text on a black background shows the artifacts, especially when looking at a full green-on-black text in a terminal the artifacts are very distracting, text seems to be flickering and wavy, not something you want to look at for a long time.

If I rebuild without CUDA support the problem goes away and everything looks perfect, but performance is much worse even when using 4 encoding threads on a fast CPU.

Except for this issue Sunshine + Moonlight is almost perfect, I’m super impressed by the image quality and responsiveness for HiDPI Remote Desktop use cases with GPU acceleration, I haven’t found anything that even comes close. It would be perfect if the flickering when CUDA is enabled was fixed.

w0utert avatar Oct 01 '22 16:10 w0utert

The patch/workaround by @leenr indeed also fixes the problem for me, at the expense of pegging one CPU core at 100% all the time, artefacts are completely gone and performance is still very good. I guess a CPU fallback is now used for the disabled CUDA code path that introduced the flickering? I don't have any CUDA experience so unfortunately I'm not really of any help debugging I'm afraid.

w0utert avatar Oct 02 '22 16:10 w0utert

I noticed high CPU usage some time later after writing my comment above and before I tried to switch to 4K after testing under 1440p, sorry.

My suspect was that ffmpeg's swscale is not capable of doing hardware offload to convert BGR0 to NV12. I managed to fix it by asking NVFBC to output in NV12 directly, but then for unknown reasons I was unable to get swscale to work with it correctly (that it should convert from NV12 to NV12 without losing color). After many hours of researching, to make things "just work at last", I decided to use a very nasty workaround such as not to use swscale at all and just do a memcpy instead. It works fine for my purposes, but it won't work if the resolution that the client requires is not the same as the host's resolution.

If you are too very much in need in "just to make things work", you can use my newer version of the patch: https://gist.github.com/leenr/869ef3b0e92f71ef46844efc105cb067. But, please be aware of the aforementioned limitation that it will work only if the host and the client resolution is matching exactly; that I did not test it at all under any configuration other than 4K, NVFBC, HEVC via NVENC, HDR off; and that it may break any other configuration while it is applied.

leenr avatar Oct 02 '22 16:10 leenr

Thanks!

For now I’m happy to sacrifice 1 CPU core for this, it’s a trade-off I’m willing to make for better image quality. But it’s great you already figured out pretty much exactly what and where the issue is, I would think the best solution would be to find out why the CUDA color conversion is producing artifacts.

w0utert avatar Oct 02 '22 17:10 w0utert

I'll take a look at it. I also use nvfbc is my application and it doesn't have this flickering problem (that I can see) so i'll see if i can solve it

dec05eba avatar Oct 02 '22 17:10 dec05eba

It’s easy to miss unless you are looking for it, for example my wife just didn’t see it even when I pointed it out. Once you’ve noticed it once it is impossible to unsee though, and becomes extremely distracting. Things like browser windows with black on white text appear to be unaffected, probably because the elements that flicker (the text) is already black.

A Linux console with green or red text on black background is a good way to reproduce, especially when the content is being refreshed continuously, e.g. opening a htop in a full-screen terminal

NvFBC or no NvFBC made no difference for me, as long as NvEnc was enabled the problem shows

w0utert avatar Oct 02 '22 18:10 w0utert

I don't think that fixing the CUDA conversion is the best solution in the long term. If I understand correctly, NVFBC does not require CUDA by itself, and will gladly output NV12 frames without external help and with reasonable efficiency (and, seemingly, without artifacts mentioned in that issue at the first place). So, if we can just tell NVFBC to output NV12 (or other format which we need) directly, add support for specifying color format in the image_t structure and make swdevice_t::convert honor it (as opposing to always assume that it is BGRA), we may drop the CUDA dependency entirely and remove a handful of code without adding much to replace it (and that can be reused for another sources if needed), decreasing maintenance burden. But I can be wrong, so feel free to correct me. I don't see why NvFBC feature wasn't implemented in such a way in the first place, so there may be an issue after all.

leenr avatar Oct 02 '22 19:10 leenr

(to @w0utert)

NvFBC or no NvFBC made no difference for me, as long as NvEnc was enabled the problem shows

It is because if SUNSHINE_BUILD_CUDA is defined, cuda::make_hwdevice is used instead of std::make_shared<hwdevice_t>() in x11_attr_t::make_hwdevice, making the issue appear when using x11grab as well.

If you don't use NvFBC, you can workaround it either by disabling CUDA support during Sunshine compilation, or by commenting out or removing related lines.

leenr avatar Oct 02 '22 19:10 leenr

But what about x11 capture + NVEnc? This combination also has these artifacts for me, and I guess in that scenario color conversion to NV12 is still required?

w0utert avatar Oct 02 '22 19:10 w0utert

@leenr I see your last comment just crossed mine. I think I don’t fully understand all the details, my assumption was that NV12 is what the client expects/needs but not what the capture provides (at least not when using x11 capture), so some conversion is required. Only thing I know for sure is that I have been using x11 capture + NVEnc, which had low CPU but artifacts, NvFBC capture + NVEnc which also had low CPU and artifacts, and builds without CUDA (software encode fallback) which all had high CPU use but no artefacts, and a patched build with your changes for NvFBC + NVEnc but no CUDA, which is currently the best combination (no artefacts, hardware encoding, but still some CPU overhead presumably for the NV12 conversion).

But maybe I’m mixing up things or making wrong assumptions, and just confusing the discussion ;-)

w0utert avatar Oct 02 '22 19:10 w0utert

Oh, yes, you're probably right - conversion may still be required for x11grab, and without CUDA it is done via swscale. Sorry, for some reason it hasn't crossed my mind :( In that case, yes, it may be better to fix CUDA conversion instead of removing it, I was wrong.

leenr avatar Oct 03 '22 10:10 leenr

Admittedly, x11grab + NVidia is probably a relatively rare use case, considering most people will run this on server GPU's (which all support NvFBC) and otherwise you can patch the driver to enable NvFBC on consumer GPU's. I can imagine some users do not like the idea of patching their drivers though, or don't have root access or something like that.

w0utert avatar Oct 03 '22 11:10 w0utert

Some more info after further testing this:

  • Flickering is also present on another machine with Quadro GPU and CUDA 11.4 instead of 11.7, and different host (Moonlight-QT on Windows instead of Moonlight QT on MacOS)
  • Doesn't matter which codec or encoder settings you choose, neither on client nor server (ie: client H264/H265 doesn't make a difference, none of the server NVEnc settings make a difference, bitrate makes almost no difference to the point that if you lower it so much everything becomes blurry anyway the flickering becomes a minor contributor to overall image quality)
  • The first patch by @leenr fixes the artefacts, but at the expense of pegging 1 core at ~100% all the time because of some software fallback in swscale apparently. This affects overall performance significantly, not nearly as bad a software encoding or x11grab without CUDA, but the effect is very noticeable
  • I did not try @leenr's second patch with replaces the swscale with a memcpy because I want to be able to have different capture & streaming resolution

I think fixing this issue would be a very nice improvement for anyone using NvFBC + NVenc for building a remote desktop solution. Is anyone (maybe @dec05eba ?) planning to have a look at this? If not, I may have a go at trying to tackle this, it seems like a worthwhile effort. Since I have zero CUDA experience, my solution would probably be to modify @leenr's patches and completely bypass CUDA for the 'happy flow' (NvFBC capture in NV12 format + no resize) with a fallback path otherwise, then after that maybe find a way to have hardware-accelerated rescaling only (no color conversion) though that would probably also involve CUDA so maybe not so easy for me to do. None of this would solve the x11grab + CUDA NV12 use case though.

w0utert avatar Oct 07 '22 10:10 w0utert

@w0utert The solution I wanted to make was to remove all color conversion and just use the native RGB color format from nvfbc and pass that directly to ffmpeg (set frame->data[0] to the cuda device pointer) without any copies. I dont know about sunshine codebase but nvfbc has frameSize option to scale the captured frame directly in nvfbc so there is no need to do anything extra for that. That keeps everything in gpu while fixing the flickering issue. But there might be some options that are not available with RGB input (?). Anyways I wont do that now, so you can work on your fix instead.

dec05eba avatar Oct 08 '22 22:10 dec05eba

I think I've tracked down this problem as UV-aliasing in the CUDA RGBA => YUV420 conversion code. The conversion only takes 2 horizontal samples for calculating the UV (chroma) channels, while chroma is subsampled 2x in both X and Y in the output. This results in aliasing artefacts that are not temporally stable if the content changes regularly like in a htop window, probably because the changing content affects the encoder in a way that amplifies or attenuates the UV aliasing in a different way each frame. A screen of static terminal text does not exhibit the same flickering, for example.

I have had some success modifying the RGBA_to_NV12 CUDA code, using simple 2x2 area average resampling, see https://gist.github.com/w0utert/a865422c5162155b364ebc83ac777e4c for a patch.

This patch greatly reduces the flickering artefacts, but it does not completely eliminate them. Text is generally flicker-free, but sharp (non-antialiased) content with high contrast (for example lines of text with a different background color) still has very slight flickering. Maybe this can be further improved with better filtering, but I'm not sure, I only have a very basic understanding of signal theory, so maybe my analysis is not entirely correct.

My guess is the best way to address this issue is to have a look at how RGBA => YUV420 software implementations in e.g. ffmpeg or swscale handle this (they don't seem to suffer this problem) and rewrite the CUDA conversion routines in a similar way.

w0utert avatar Oct 09 '22 15:10 w0utert

I've updated my patch [1] for the CUDA RGB => NV12 conversion, it now properly subsamples one 2x2 block of RGBA values to 2x2 Y values + 1 UV pair. The previous implementation was only subsampling horizontally, and outputting 2 UV pairs per 2x2 pixel block. Also, averaging is now done in UV space instead of RGB space, and using the MPEG-2 sampling scheme (UV-subpixel aligned with 2 left pixels of the 2x2 block, instead of sitting in the center), which improves image quality (I cannot authoritatively say this is 'more correct' though, it just 'looks better' to me).

With this patch, flickering artefacts are completely gone and image quality is (to my eyes) indistinguishable from the non-CUDA path. Maybe someone else who may have better eyes than I do can verify ;-)

Unless there are good reasons this change is not desirable, I will create a pull request later.

[1] https://gist.github.com/w0utert/a865422c5162155b364ebc83ac777e4c

w0utert avatar Oct 09 '22 21:10 w0utert

If you have this issue, please test this build and let me know if it resolves things for you. https://github.com/LizardByte/Sunshine/actions/runs/3245354127

ReenigneArcher avatar Oct 14 '22 01:10 ReenigneArcher

I've applied w0utert's patch to nightly and I can confirm this fixes the rendering artifacts reported in this ticket.

ton avatar Oct 18 '22 14:10 ton

This issue has been fixed and will be available in the next release.

github-actions[bot] avatar Oct 30 '22 16:10 github-actions[bot]