Lightpack icon indicating copy to clipboard operation
Lightpack copied to clipboard

Add Nvidia Frame Buffer Capture (NVFBC and NVIFR) source please for hookless high performance capture in games!

Open v00d00m4n opened this issue 6 years ago • 148 comments

Nvidia uses hardware FBC for its own streaming and recording software Expirience. Steam also uses it for streaming to steam link.

They work independant from any API uses in game or sofware. FBC effective for full screen framebuffer capture and IFR for windowed capture, her some details:

NVFBC

Captures the framebuffer (front buffer) without any involvement from OpenGL or Direct3D.

Effectively a direct copy of the framebuffer irrespective of which application(s) drew it.

It generally only works sensibly in fullscreen mode. If you render in windowed mode and use NVFBC, it is going to capture the entire screen including your desktop and other unrelated windows.

NVIFR

Slightly more complicated and less performant than NVFBC, this can capture a single application.

In my experience this used to be how Steam would stream windowed-mode applications. I have not seen this capture path in a very long time and I am glad because performance was awful whenever it was used.

Here some documentation https://developer.nvidia.com/sites/default/files/akamai/designworks/docs/NVIDIA%20Capture%20SDK%20Programming%20Guide.pdf

http://on-demand.gputechconf.com/gtc/2016/presentation/s6307-shounak-deshpande-get-to-know-the-nvidia-grid-sdk.pdf

Here the capture SDK itself: https://developer.nvidia.com/capture-sdk

ATI also has something similar, but im not very familiar with it, maybe google it.

Anyway, implementation of this will reduce CPU load and performance impact completely, please do this as soon as you can.

v00d00m4n avatar Jan 10 '19 22:01 v00d00m4n

Anyone?

v00d00m4n avatar Jan 23 '19 04:01 v00d00m4n

Try this: get the last build of Prismatik, set it to whatever framerate you realistically want to use with your LEDs (or something like 30fps to make it simple) run TimeSpy for example

quit Prismatik, setup Shadowplay with same framerate and a low quality preset (I don't know if all this is possible), start capture run same benchmark

and a run of benchmark alone

report results

zomfg avatar Jan 23 '19 10:01 zomfg

Well its not much about GPU but about CPU unloading, which not always affects games, since not all of them CPU performance sensitive, and its also about compatibility and hardware accelerated capture. NVFBC allows to capture any fullscreen app no matter what API it uses, since capture is taken directly from videocard frame buffer so no hooks and CPU processing required at all. In similar fashion NVIFR allows to capture desktop and anything windowed, even some windows that does not work wind desktop dublication, you can also capture game which uses unsupport api like Vulkan, OpenGL or some old DirectDraw games or even some UWP apps with video stream (try IVI uwp from windows store, it video stream does not get capture by prizmatic, only UI affects it). And most important - its very low latency. Basically its low level direct hardware access to frame buffer, you may capture even some mad skills asm written demos that does not use dx or ogl or vk and instead pure low level with direct GPU access.

Here is example of open source software that uses NVFBC https://github.com/gnif/LookingGlass/tree/master/host/Capture just examine this code and repeat it, and download SDK for headers, and thats it. (as a bonus this one also has nice DXGI capture code, implement it to prismatic as well would a nice alternative to find sweet spot with good performance and compatibility).

Another even better example of usage of NVFBC in open source software is here https://github.com/bloodelves88/CloudyNvCapture

Please, take a look.

v00d00m4n avatar Jan 24 '19 04:01 v00d00m4n

I'm not arguing, I just wanted to have a rough idea of performance difference which I can not test myself (and neither other benefits since I'm on the red team right now). Also, unless we are talking crazy capture framerates (and even then..), Prismatik's CPU load comes mainly from averaging the colors of widgets, especially with full sized frames (or more accurately, depending on the resolution and widget size). To help with this, the last release includes downscaling for ddupl (width/8 and height/8) which is made by the GPU. So unless color averaging is ported to CUDA or equivalent, I doubt we'll see a significant difference in performance (if any, I don't have any experience with GPGPU, I'm just speculating). But even then, it'll mainly benefit (IF it benefits at all) the old capture methods that only get full sized frames. I'm saying this just to lower your expectations in the performance department, but I'm still curious to see some numbers, so if you get a chance to bench...

zomfg avatar Jan 24 '19 10:01 zomfg

I also don't expect too much of an improvement. Desktop Duplication is also largely API independent (no injection) and hardware optimized. It no doubt has better compatibility than anything Nvidia specific. Anyone interested is welcome to give it a shot, but since my time is limited these days, don't get your hopes up for me doing it.

psieg avatar Jan 24 '19 18:01 psieg

@psieg now that Desktop Duplication is broken (it disables variable refresh rate) it could worth a try. is it a lot of work to add NVFBC and NVIFR support?

sblantipodi avatar Jan 29 '19 13:01 sblantipodi

Is it nVidia exclusive or it will work also on AMD?

Benik3 avatar Jan 29 '19 13:01 Benik3

it's nvidia but amd has its own

sblantipodi avatar Jan 29 '19 13:01 sblantipodi

AMD has ReLive and I didn't find any API. So it mean add to Prismatik two new captures - one for nVidia and second for AMD... But how e.g. twitch connects to these streaming? Isn't possible use this way to get the picture, so it would be universal for AMD or nVidia? I don't know much about this, but maybe it will help someone to find a way :)

Benik3 avatar Jan 29 '19 13:01 Benik3

@psieg now that Desktop Duplication is broken (it disables variable refresh rate) it could worth a try. is it a lot of work to add NVFBC and NVIFR support?

Find something that uses one of those and compare to ddupl

zomfg avatar Jan 29 '19 14:01 zomfg

I also don't expect too much of an improvement. Desktop Duplication is also largely API independent (no injection) and hardware optimized. It no doubt has better compatibility than anything Nvidia specific. Anyone interested is welcome to give it a shot, but since my time is limited these days, don't get your hopes up for me doing it.

Improvement is huge - DD is stil a software capture that requires translation of many api calls into other api calls and driver calls and it adds noticable latency, and uses CPU resources, ITS HIGH LEVEL. Yet again NVFBC and NVIRC are PURE HARDWARE LOW LEVEL capture that skips all the middle men apis and talks directly to hardware, skipping a lot of unnecessary stuff.

DD cant be as fast as NVFBC because in battle SOFTWARE vs HARDWARE accelerated, anything HARDWARE ACCELERATED always wins. Same goes for HIGH vs LOW level, low level wins.

Difference in resource usage and performance is the same as difference between OBS and NVidia epxirience streaming - OBS is software capture tool and waste CPU and slows down games and you loosing like 2-10 fps depending on game. Nvidia Expirience on the other hand absolutely does not load CPU and does not waste any FPS during streams. OBS usually adds 10-15% CPU load and NVE adds 0% cpu load. And just to compare - right now with DD Prizmatik eats about same 15% of CPU and i guess implementation of NV or ATI direct hardware framebuffer capture will reduce load to less than 3-5%, which is quite a lot for heavy games.

Another reason why it needs to be done is compatibility - like i said before - DD does not always works, For example it does not capture video streams from UWP (just search for IVI in windows store and test with any video from it, it has plenty of free movies), maybes its done for DRM sake, to prevent video capture via DD to not allow anyone to record video streams from paid services, im not sure why i happens, but with NVFBC nad NVIFR i can easily record same UWP aplication and anything, even DRM protected web streams, because NVFBC, yet again, does not work with other APIS, it directly takes whatever is now in frame buffer of video card, and since Windows UI is compossed via video card, now matter how hard they try to DRM, they have to put video stream layer in overall desktop composition, so its exist there as is.

Also it works with old games without any dependency on API used to render things, anything that renders via hardware video card buffer gets capture no matter tha API it uses. Which is again not so with DD. You can find many reports on internet that DD does not capture everything. And since you dont need DX hooking - its a better both for compar and performance and even also is safer for using in anti cheat protected games and games that has some mods that already does DX hooking.

As far as i know - Steam has all 3 API implemented for streaming - DD for general purpose, NVFBC for Geforces and ATI thing for Radeons, plus it uses NVENC for hardware lagless encoding of video stream which also improves performance alot comparing to DD and software encoding steam does on stream if no NV or API option is selected.

So DD is good in general purpose as a fail safe method, while NV and ATI apis are better for gaming and even DRM protected streamed videos.

As for implementation - please read all the PDFs and source examples i linked in my posts above - its very easy, all you have to do - copy past general code "shell" you use for DD and just replace DD calls with NVFBC and ATI api calls , and do some minor tweaks. I thinks its even possible to use direct frame buffer transition to video card memory to reduce its size and sample zone colors, but im not sure about this part.

P.S - Oh btw, recent SDK may be limited to use with pro cards only, so if it would not work, just look for older version of SKD that did not had such restriction.

v00d00m4n avatar Jan 29 '19 15:01 v00d00m4n

For AMD capture api i think you need to look at this https://gpuopen.com/gaming-product/advanced-media-framework/ but im not AMD user and dont know much about them, so im not sure.

v00d00m4n avatar Jan 29 '19 15:01 v00d00m4n

@v00d00m4n if it's simple why don't you try to do it? It seems that you are able to do it. Please do it if you can :)

sblantipodi avatar Jan 29 '19 15:01 sblantipodi

Sorry to say this but DDupl works perfectly fine for most users. VFR isn't spread enough yet for this to be a problem. We're volunteer contributors working on this in our free time, we don't have the capacity to build two new capture sources for 20 users. If you really want it find and pay a developer to do it. Sorry.

Anyone is welcome to try this and send a pull request, I'll happily include it for everyone to benefit.

psieg avatar Jan 29 '19 21:01 psieg

Sure, it's completely legit, thanks anyway for the great job done since now.

In any case I think that everyone who use Prismatik for gaming today uses VRR monitors at least the vast majority. Really few gamers today does not use any technique of Variable Refresh Rate simply because VRR gives you better experience than a better GPU for less money :)

Hope that someone will join the project and support it since this is a deal breaker for most gamers.

sblantipodi avatar Jan 29 '19 21:01 sblantipodi

@sblantipodi @v00d00m4n I have implemented NvFCB grabbing support in my fork of this repo.

Lightpack

It needs some testing as I do not have an led strip yet. A could only test it with an RTX 2070 and it may crash if you have an Intel or AMD graphics card.

It also scales the picture by a factor of 8 like the Ddupl grabber.

Nvidias Freesync seems to work with an Asus MG279Q without stuttering while NvFBC is running with a 33ms timer.

You need to run Prismatik as an admin once because enabling NvFBC needs elevation. Only the display driver will restart and NvFCB support stays enabled without a reboot.

NvFBC only works with consumer cards if you pass a magic sequence in a setup parameter. Without this magic it only works with the pro cards. I do not know about the licensing of NvFBC. As you can see in the changed files I used two of Nvidias header files which contain the function and parameter definitions in the NvFBC64.dll.

maxroehrl avatar Apr 03 '19 17:04 maxroehrl

@maxroehrl Oh my god. I love this man! Congratulations man! testing it right now...

sblantipodi avatar Apr 03 '19 17:04 sblantipodi

@maxroehrl Ok I have tested it on my RTX2080Ti on an Acer XB271HK, Windows 10 1809. The monitor have a built in "refresh counter (not framerate counter but refresh counter)" so I can easily see with clear numbers if my GSYNC is working properly.

I generally use a framerate limiter to limit the framerate to 57FPS in order to have GSYNC always engaged even at higher framerate to reduce input lag. My monitor is 60Hz, at higher framerate GSYNC disengages and input lag is not that good as with GSYNC on.

With your patch GSYNC works flawlessly and it is a complete smooth experience. Before this patch and before windows broke things, I was able to see some "additional input lag" even with DDUPL, now it is smooth as if nothing is running under.

It works flawlessly with DX11, DX12 and Vulkan.

I tried The Witcher 3, AC Odyssey, Shadow of the tomb raider, Metro Exodus, Devil May Cry 5, youtube and a test app of my own.

A flawless experience.

Really happy with the patch, congratulations maxroehrl and really thanks for your excellent work.

@psieg this could not be possible without your excellent work too, so thank you too. the patch is a complete jewel for gamers and non gamers since it's incredibly faster than ddupl, leave it alone winapi so do you plan to merge the patch in the "release branch" anytime soon? I used a fast camera to shoot led speed transition from one color to the other (from red to blue), with ddupl there are some frames where I see the monitor green and the led blu, with NvFBC64 led transition is much better without any errors.

thank you guys. really great work!

sblantipodi avatar Apr 03 '19 17:04 sblantipodi

@maxroehrl cool 👍

It also scales the picture by a factor of 8 like the Ddupl grabber.

The 8-factor is just a sweet spot for ddupl, so if there is no negative impact to scaling lower you can try something like 20x (so grabScreen.scale = 0.05)

Also if you can avoid copying the framebuffer and use it as is (like in ddupl grabber), it'll save few % on CPU load.

zomfg avatar Apr 03 '19 18:04 zomfg

@zomfg You are right, it is possible to only use one buffer. I also increased the downscale to 16x.

@sblantipodi Thanks for testing! Can you also test if there is a negative impact on the visuals with 16x downscaling compared to 8x?

maxroehrl avatar Apr 03 '19 19:04 maxroehrl

Maaan you did it! HUGE RESPECT!

Now a little request - can you just quickly add UI and ini element for configurable scaling factor instead of having it hardcoded?

v00d00m4n avatar Apr 03 '19 20:04 v00d00m4n

So nVidia is solved, but is there any solution for AMD? (maybe using AMF?)

Benik3 avatar Apr 03 '19 20:04 Benik3

@maxroehrl i found an issue - for some reason it does not catch some UWP windows UI. Can you search please ivi app in WIndows Store and try with it both DD and NVFBC modes to see the difference and find out whats wrong? DD cant catch video sctream but catches UI no problem, your NVFBC implementation does none of this. while it actually should be opposite and i really expected that i could now finally watch streaming movies and tv shows with ambilight. ALso try it with netflix and few other streaming services from Windows store.

v00d00m4n avatar Apr 03 '19 20:04 v00d00m4n

@v00d00m4n can you be more specific? What UWP windows ui? I have just downloaded IVI App on the Windows Store and all the app works flawlessly, even while playing videos.

EDIT: Even netflix works flawlessly here

sblantipodi avatar Apr 03 '19 21:04 sblantipodi

@sblantipodi @v00d00m4n I have implemented NvFCB grabbing support in my fork of this repo.

Lightpack

It needs some testing as I do not have an led strip yet. A could only test it with an RTX 2070 and it may crash if you have an Intel or AMD graphics card.

It also scales the picture by a factor of 8 like the Ddupl grabber.

Nvidias Freesync seems to work with an Asus MG279Q without stuttering while NvFBC is running with a 33ms timer.

You need to run Prismatik as an admin once because enabling NvFBC needs elevation. Only the display driver will restart and NvFCB support stays enabled without a reboot.

NvFBC only works with consumer cards if you pass a magic sequence in a setup parameter. Without this magic it only works with the pro cards. I do not know about the licensing of NvFBC. As you can see in the changed files I used two of Nvidias header files which contain the function and parameter definitions in the NvFBC64.dll.

I will open a pull request if somebody tells me if its working with a lightstrip. Changed files Prismatik NvFBC.zip Download (8x Downscaling)

Man you are just magic. Thx for this amazing job. I was desperate to keep these lags with my gsync screen. I tested it with PUBG Shadow of the tomb raider far cry new dawn. Everything buttery smooth. When a dream becomes real.

But when you move a window slowly to the top, with black wallpaper, you see some lags with led off to white, like scales. It is worse with 16x. With DDup it is really

(Tested with 112 leds strip on my 27 screen) Sorry for my english.

Arn0111 avatar Apr 03 '19 21:04 Arn0111

@zomfg You are right, it is possible to only use one buffer. I also increased the downscale to 16x.

@sblantipodi Thanks for testing! Can you also test if there is a negative impact on the visuals with 16x downscaling compared to 8x?

Here is the updated version: Prismatik NvFBC.zip Download (16x Downscaling)

you have all my respect, tell me what I need to do and I will do it. I tried it, on my RTX2080Ti I see no performance difference with naked eye, I tested with with a photo burst with this video https://www.youtube.com/watch?v=sr_vL2anfXA&list=FL9kxMLPqCEA187NGgz1fo9w&index=13&t=0s since the seconds 50 the video is quite punishing for led performance and the experience is FLAWLESS.

I even tried what @Arn0111 said but I see absolutely no difference between NvFBC and DDUPLon my 2080Ti.

Sincerely I'm not able to see differences between 8x and 16x, neither in performance or quality. I don't know if GPU/CPU performance has some impact here. Arn0111 what GPU are you using?

PS: Just to be precise if it can be useful I'm running 95 LEDS on a 27 inch display, so LEDs are quite precise. Don't know if with less LEDs some defects can be more noticeable.

sblantipodi avatar Apr 03 '19 22:04 sblantipodi

It’s like a led is turning on with less gradients when you move really slowly a window to any border. It’s jerky like a 30 fps compare to 60 fps, almost like WinAPI. With 16x it’s Like 10 fps. I have a MSI 1080 gaming X. With DDup it’s smooth like a real 60fps. In game or videos you can’t really see this issue.

Arn0111 avatar Apr 03 '19 23:04 Arn0111

@v00d00m4n can you be more specific? What UWP windows ui? I have just downloaded IVI App on the Windows Store and all the app works flawlessly, even while playing videos.

EDIT: Even netflix works flawlessly here

For me whenever i start IVI app ambilight just stops updating and freezing at the last frame before app fully loaded, once i close app starts updating again.

v00d00m4n avatar Apr 04 '19 00:04 v00d00m4n

Oh wait, what windows build do you use? Maybe it somehow related to latest preview releases post 1809 build, where MS gone totally crazy and fully removed exclusive fullscreen and did some other changes to how windows are rendered.

I use version 1903 ( build 18362.1). Latest nvidia game ready driver and GTX 1080 ti with 4k display connected to HDMI and i noticed that the way how windows operates changed, it seems like, just by looking at what my tv really shows - resolution is always 4k, and rest of resolutions just got upscaled to 4k virtually and display does not switch to real display modes. I dont know if that related but this behavior in recent version of driver and windows 10 drives me nuts and very lame and it could be a reason, but i dont know for sure.

v00d00m4n avatar Apr 04 '19 00:04 v00d00m4n

It’s like a led is turning on with less gradients when you move really slowly a window to any border. It’s jerky like a 30 fps compare to 60 fps, almost like WinAPI. With 16x it’s Like 10 fps. I have a MSI 1080 gaming X. With DDup it’s smooth like a real 60fps. In game or videos you can’t really see this issue.

It seems to not happen here,.is it a.performamce related problem? What timer are you using? Using 33ms here even if 50ms could be enough.

sblantipodi avatar Apr 04 '19 06:04 sblantipodi