rerun Support for streaming encoded video from continuous log/send calls

We want to be able to support streaming video in via continuous log/send calls.

We want to be overall compatible with compressed video as it shows up in mcap, see https://github.com/foxglove/foxglove-sdk/blob/main/schemas/ros2/CompressedVideo.msg, and stick otherwise closely to WebCodec's specification https://www.w3.org/TR/webcodecs

The rough sketch for this feature would be to introduce an VideoStream archetype to which users continuously add VideoSample components (or in manual batches via send_column).

The VideoStream archetype has some meta data that is ideally logged as static (so it's never GC'ed even if we lose samples). The metadata contains at least the description https://www.w3.org/TR/webcodecs/#dom-videodecoderconfig-description but we likely need more.

Each VideoSample component is equivalent to a sample in our decoder abstraction, consisting of a videochunk, a presentation timestamp. It should be possible to map it to WebCodec's video chunk (init) datastructure, see https://www.w3.org/TR/webcodecs/#encodedvideochunk. For H264/H265 we want to take over the restrictions mcap has right now which closely mirror how our decoder expect data anyways:

- Use Annex B formatted data
- Each CompressedVideo message should contain enough NAL units to decode exactly one video frame
- Each message containing a key frame (IDR) must also include a SPS NAL unit

(annex B is implied by WebCodec compatibility, SPS on IDR is needed to make decoder truly re-startable at an IDR frame) (todo: shouldn't this also include PPS?) WebCodec requires us to flag IDR frames, see https://www.w3.org/TR/webcodecs/#encodedvideochunk. Let's try to infer this. If necessary we can add it as an optional attribute.

To begin with (in order to cover in-memory use), all samples on the VideoStreams are fed to the decoder at once, we essentially ignore the timeline since necessarily all samples come with their own presentation timestamp. Garbage collection is free to discard old samples which is fine as long as there's still an IDR frame prior to the currently requested presentation timestamp. Constraint: samples are expected to be in decode timestamp order (otherwise we have to do reordering which is expensive perf & memory wise for long streams; also would break under GC).

Ideally VideoFrameReference should become optional for displaying video in the context of VideoStream, but at least for the first iteration we still want to have it the only way to visualize things: an inherent problem is that VideoSample may not show up on the timeline at the points in time & order as expected for presentation: video samples come in burst and not in order of their presentation timestamp. -> Corollary: Having many (all?) video samples logged on a single timestamp on any timeline is just fine, just like it is today with mp4 files.

The first iteration should come with a simple (gstreamer?) streaming example that demonstrate live feeding into the viewer.

Sep 24 '24 08:09 Wumpf

Workaround

If you can't wait for us to implement this feature, you can work around it by logging your video stream as many shorts videos, each logged as a AssetVideo to the same Entity Path.

Oct 11 '24 07:10 emilk

This is a really important feature!

May 20 '25 01:05 patrickelectric

@patrickelectric this feature is on it's way now. Do you think you could share more about your use case and need here. What larger problem are you trying to solve? Which specific parts are most important to you? Can you think of a single demo that would communicate to you that this feature solves your needs?

Jun 17 '25 09:06 nikolausWest

Hi @nikolausWest thanks for the questions.

I work with @bluerobotics developing BlueOS for ROVs and USVs (that can also be used for aerial drones). We are working towards zenoh integration in our system and using foxglove messages and at the moment integrating mcap recording.

With that said... We already have cockpit for vehicle control and some data visualization. But we want to use rerun to a more in depth approach, where we can see sonar, lidars, video, log and have everything synchronized to debug logs and visualize real-time vehicle data. Low latency, or a glass-to-glass latency of 200ms or less (100ms) would be awesome.

A simple and ideal demo would be a gstreamer pipeline or a raw h264 video file publishing data to rerun to visualize the stream.

Jun 17 '25 11:06 patrickelectric

A simple and ideal demo would be a gstreamer pipeline or a raw h264 video file publishing data to rerun to visualize the stream.

I didn't have much success with gstreamer so far mostly because of setup issues on Mac. But I got examples for live streaming h264 video from ffmpeg on the way. So sounds like this gonna be a great fit :) Depending on platform and amount of data live decoding in the viewer still struggles a little bit, but in the web viewer things are already quite smooth. We'll likely ship more improvements in 0.25 towards that.

I'll post a link of the example here soon; would be awesome if you can have a look or even try a nightly build with the feature :)

Jun 17 '25 14:06 Wumpf

We got a maintained snippet and an example now for the new VideoStream archetype!

https://github.com/rerun-io/rerun/blob/f8a3c2408ba4bd5a4af8ac659e8f19f92c264ba7/docs/snippets/all/archetypes/video_stream_synthetic.py
https://github.com/rerun-io/rerun/tree/f8a3c2408ba4bd5a4af8ac659e8f19f92c264ba7/examples/python/camera_video_stream

Any feedback would be welcome very much, otherwise we'll most likely ship with this interface.

The only thing missing for closing this ticket is updated video reference docs. There are also some smaller improvements (and regressions) on video in general that I want to land for 0.24, but nothing crazy

Jun 27 '25 06:06 Wumpf