jupyter_rfb icon indicating copy to clipboard operation
jupyter_rfb copied to clipboard

Using an mpeg4 stream?

Open almarklein opened this issue 4 years ago • 6 comments

One of the options mentioned in #3 is the use of an mpeg4 stream. I don't think it's worth the effort to look into it now, but let's collect some thoughts and findings here.

Front-end

Let's start with the easy part. Could be a <video> object, or using the webCRT API.

MPEG4 encoding

We'd need to encode the images as an mpeg4 stream. Some options:

  • imageio-ffmpeg - can currently only write to a file on the filesystem. Maybe it can be changed to pipe the result back to Python? Moving the data down pipes will come with a performance penalty though.
  • ffmpeg-python - pretty sure this one can do in-memory encoding.

We'd need to set things up so that the encoding is suited for a stream (e.g. variable framerate, minimal buffering).

Streaming the video

If the server would host a web server, in the form of an URL endpoint that provides a streaming http connection, we can probably push the frames over it. Disclaimer: I've worked with video encoding and web servers, but never combined.

WebCRT

Browsers have an API for streaming video called WebCRT, which may (or may not) make the implementation easier. See e.g. https://github.com/maartenbreddels/ipywebrtc

We'd need an implementation of WebCRT in Python, and that should include mpeg4 encoding (see above). This would be a lot of code/work, unless there exists a library for that.

Something to keep in mind is that the current throttling mechanism schedules the server-side drawing based on how fast the client can display the images. This seems to work very well, and if we'd ever implement WebCRT we should make sure to have a similar coupling.

General thoughts

  • This only touches on one aspect of performance, see the top post in #3.
  • Using jpeg's will already be a huge improvement. Using mpeg4 also applies temporal encoding, but I don't expect the gain to be huge.
  • I expect that mpeg4 encoding will make the code a lot more complex.
  • Heavy dependencies.

My current view is that it's not worth the effort. But let's keep the discussion open for the time being.

almarklein avatar Sep 22 '21 20:09 almarklein

I apologize. My understanding that mpeg4 would be a huge savings in transmission performance (bytes sent/received) and would make up for any encoding/decoding time increase. This sounds like too much work, but feel free to close or leave this open if others want to comment on it in the future.

Regarding ffmpeg and writing to files, at least on a Linux system you should be able to define a named fifo (os.mkfifo) to write to and read from. The kernel should not write anything to disk (95% sure).

djhoese avatar Sep 22 '21 20:09 djhoese

My understanding that mpeg4 would be a huge savings in transmission performance (bytes sent/received) and would make up for any encoding/decoding time increase

I think this sentence is missing a word, making it ambiguous what you meant :)

almarklein avatar Sep 22 '21 20:09 almarklein

"My understanding was that..."

In the other issue you mentioned mp4 video and my mind immediately went to "oh that will compress well so it'll be better in every single way except for maybe code complexity". It is clear that is isn't as much of a victory as I had hoped so I'm not as eager to figure it out.

djhoese avatar Sep 23 '21 01:09 djhoese

I think it would be interesting to reopen this issue in 2024.

PyAV is readily installable on many platform and so is imageio-ffmpeg, I think it could really help reduce bandwidth requirements.

hmaarrfk avatar Nov 03 '24 17:11 hmaarrfk

I was planning to experiment/benchmark some ideas for improving the performance. Related: @kushalkolar is interested in encoding to jpeg on the GPU.

almarklein avatar Nov 04 '24 08:11 almarklein

I think there are a few interesting things about "video" encoders (including newer ones such as H264 and AV1):

  1. They naturally detect "same" scenes and "encode them to "0". So in the absence of "noise" they can be very effective at encoding static scenes.
  2. They are naturally lower bandwidth.
  3. They have 'time' as a natural parameter (such as presentation time stamp (pst) and decode timestamp (dst)).
  4. They are well suited for hardware decoding (if we can get the pipeline working correctly).
  5. They "improve" the quality of a static image over a few frames.
    • Consider a complicated static scene with 'noise'. Then the first frame will look "bad", but over a few frames, it will imporve on a lower bandwidth connection.

I was personally interested in WebRTC since it promises to be "peer to peer". So if you have a server and a computer on a local connection, it will find the natural shortest path (or so it promises).

This seems to be a project that is inline https://github.com/aiortc/aiortc/ with the whole move to "async" which would be really interesting to see how to make it work.

I have to say, i spent all weekend with ChatGPT trying to get WebRTC to work and I somewhat failed.

hmaarrfk avatar Nov 04 '24 14:11 hmaarrfk