xpra gstreamer shadow capture

Based on #3706, provide a codec we can use with a gstreamer pipeline to capture the display(s) via gstreamer. This should work for:

MS Windows: dx9screencapsrc or gdiscreencapsrc
MacOS: avfvideosrc - not sure about this one: building applemedia from gst-plugins-bad seems to require gst-plugins-gl which has not had a release in 10 years! (and no commits since 2014) Also tries to build against GStreamer 0.10!
X11: ximagesrc
Wayland: org.freedesktop.portal.ScreenCast - as per: xdp-screen-cast.py - uses dbus to request authorization

Main difficulty is going to be the integration of events with the pull interface currently used by the base shadow server class.

Later on we can consider doing the whole pipeline in gstreamer, which will be a lot quicker than extracting the data (assuming we can even get zerocopy buffers out of GStreamer via appsink) then compressing it in xpra... Downside is that we no longer control the encoder flow rate.

Jan 29 '23 13:01 totaam

MS Windows: dx9screencapsrc or gdiscreencapsrc

For Windows, you should use d3d11screencapturesrc. Both others are officially legacy. Not sure about Mac.

Jan 30 '23 07:01 ehfd

@ehfd yes, that's the one I meant. gdi, we might as well use our existing capture code and dx9 offers nothing that d3d11 does not have - I have kept it in anyway but I don't think it will be used.

So far, this seems to work pretty well, just needs more work:

pass display options to the capture factory, so users can specify what to capture: which monitor / geometry (and try window handle?)
there's a bug: the pipeline isn't stopped when the last client disconnects, wasting huge amounts of CPU
we set the capture framerate when creating the pipeline, which is not what we want with xpra: we want a pull model, not a push one! (not easy to fix AFAICT)

Perhaps we should use the same gstreamer pipeline for compression (at least as an option) - if that gives us some zero-copy benefits. We would then lose all sorts of benefits exclusive to xpra, like video region detection, scroll encoding, etc.. but may be worth it when a hardware video encoder is available.

Feb 03 '23 12:02 totaam

We now have the ability to screencast existing (wayland / X11) desktop sessions that support org.freedesktop.portal.ScreenCast - via gstreamer / pipewire. Running xpra shadow should now capture the wayland desktop session / monitor selected.

To get a full remote wayland desktop (#387), we need a lot more APIs from org.freedesktop.portal.RemoteDesktop or libei

Feb 07 '23 13:02 totaam

The difficulty is generating pipelines that:

work everywhere! it seems that we can afford try a few different ones with the same pipewire fd.
- hardware encoders are notoriously finicky, ie: #3843 and gnome-remote-desktop: VAAPI hardware video encoding
- pipewire issues: Pipewiresrc freezing in GStreamer pipeline - add always-copy=true? How expensive is this?
don't have any buffering / latency issues - leaky queues?
can be tuned fairly generically based on user configuration (speed vs quality)
bug on some systems:

pipeline warning: Internal GStreamer error: code not implemented.  Please file a bug at https://gitlab.freedesktop.org/gstreamer/gstreamer/issues/new.
                   ../gst-libs/gst/video/gstvideofilter.c(296)
                    gst_video_filter_transform ()
                    /GstPipeline
                   pipeline8/GstVideoConvert
                   videoconvert0
invalid video buffer received

It's not clear which buffer uses what format - GST_DEBUG=4 didn't help.

Other issues:

framerate - perhaps use a videorate element and tune it based on packet acks?
don't use video mode if mmap is available!
add option to enable the video mode with x11 shadow server via ximagesrc, using syntax like xpra shadow :100,stream=true?
propagate settings and client options: bitrate, b-frames?, fps
enable error-resiliency flags when quic is used: #3376
colorspace is wrong - nesting the view shows it is too dark
x264enc doesn't honour the quality we set? (no such problems with our own x264 encoder..)
handle client decoding errors by restarting the pipeline and / or changing to a different / safer encoder?
the pipelines take time to start - perhaps send a start-of-stream draw packet to the client?

Here are some useful links to tune this further:

gstpipewire source: https://github.com/PipeWire/pipewire/blob/master/src/gst/gstpipewiresrc.c
if available, selkies uses nvh264enc: https://github.com/selkies-project/selkies-gstreamer/blob/11006e393b9b7af81ab2b01d82a0092167da3e87/src/selkies_gstreamer/gstwebrtc_app.py#LL200C30-L200C39 via a cudaupload and cudaconvert - are there any cases where pipewire can give us a GPU buffer?
otherwise they use x264enc: https://github.com/selkies-project/selkies-gstreamer/blob/11006e393b9b7af81ab2b01d82a0092167da3e87/src/selkies_gstreamer/gstwebrtc_app.py#LL309C32-L309C32 With: threads=4 bframes=0 key-int-max=0 byte-stream=True tune=zerolatency speed-preset=veryfast bitrate={} and high profile.
or even vp8 / vp9 vpxenc: https://github.com/selkies-project/selkies-gstreamer/blob/11006e393b9b7af81ab2b01d82a0092167da3e87/src/selkies_gstreamer/gstwebrtc_app.py#L348 With {threads=4 cpu-used=8 deadline=1 error-resilient=partitions keyframe-max-dist=10 auto-alt-ref=True target-bitrate={}
gnome-shell uses these pipelines: https://gitlab.gnome.org/GNOME/gnome-shell/-/blob/c57f4a1c73413fcf3f84de448b46f66e9cb186a7/js/dbusServices/screencast/screencastService.js#L31

capsfilter caps=video/x-raw(memory:DMABuf),max-framerate=%F/1 ! \
             glupload ! glcolorconvert ! gldownload ! \
             queue ! \
             vp8enc cpu-used=16 max-quantizer=17 deadline=1 keyframe-mode=disabled threads=%T static-threshold=1000 buffer-size=20000 ! \
             queue ! \
             webmmux

capsfilter caps=video/x-raw,max-framerate=%F/1 ! \
             videoconvert chroma-mode=none dither=none matrix-mode=output-only n-threads=%T ! \
             queue ! \
             vp8enc cpu-used=16 max-quantizer=17 deadline=1 keyframe-mode=disabled threads=%T static-threshold=1000 buffer-size=20000 ! \
             queue ! \
             webmmux

We don't need the muxer, or the queue?

desktopcast uses: https://github.com/seijikun/desktopcast/blob/9ae61739cedce078d197011f770f8e94d9a9a8b2/src/stream_server.rs#L162

videoconvert !
queue leaky=2 !
x264enc threads={} tune=zerolatency speed-preset=2 bframes=0 !
video/x-h264,profile=high !
queue !
rtph264pay name=pay0 pt=96",

other projects hardcode pipelines similar to: pipewiresrc do-timestamp=true ! vaapipostproc ! queue ! vaapih264enc

May 11 '23 11:05 totaam

@totaam Our configuration for the encoder isn't really optimized. Any findings on your side?

Aug 04 '23 10:08 ehfd

@ehfd not yet optimized either: https://github.com/Xpra-org/xpra/issues/3706#issuecomment-1665252432 The current thinking is that some kind of manual setup / testing / validation may be needed to get the best out of hardware encoders because of the amount of variation there is between OS / drivers / GPUs / ... We will try to make it easier to do that.

Aug 04 '23 11:08 totaam

@m1k1o Adding Neko to the discussion.

Aug 04 '23 11:08 ehfd