go-livepeer Low-latency through direct streaming between Mist and T

About

This is a proof of concept implementation of direct streaming between media server (Mist implied) and Transcoder. It eliminates latency caused by segment download and upload through HTTP. Verification is disabled, as it requires the media server to support segment downloading.

How to test

Create HLS stream on media server, preferably, with long (10+ sec) segments to make testing easier. I used Wowza.
Configure media server to accept (and, optionally, save) RTMP streams
Use this lpms branch
Start B + OT (separate O and T configuration is not tested)

./livepeer -transcoder -orchestrator -serviceAddr 127.0.0.1:8935

./livepeer -broadcaster -rtmpAddr :1936 -cliAddr 127.0.0.1:7937 -nvidia all -httpAddr 127.0.0.1:8936 -orchAddr 127.0.0.1:8935 -orchSecret 11111111111 -v 6

Send 'segment notification' to Broadcaster - set correct duration

curl -X POST 'http://127.0.0.1:8936/live/random-stream-id/0.ts' \
                                                                  --header 'Livepeer-Transcode-Configuration:
                                                              {"inputUrl":"http://localhost:1935/vod/bbb_264.mp4/media_w1801455189_0.ts", 
                                                               "outputUrl":"rtmp://localhost:1935/live/out",
                                                               "pixelFormat": 0,
                                                               "videoCodec": "h264",
                                                               "duration": 12000}'

The Broadcaster will immediately respond with URLs which T will point output streams to, and will complete request once transcoding is done
Either record the rendition stream on media server, or quickly open it in the player

ffplay rtmp://localhost:1935/live/out/P240p30fps16x9/0.ts

Play rendition 0.1 sec after starting stream:

curl -X POST 'http://127.0.0.1:8936/live/random-stream-id/0.ts' \
                                                                  --header 'Livepeer-Transcode-Configuration:
                                                              {"inputUrl":"http://localhost:1935/vod/bbb_264.mp4/media_w1801455189_0.ts", 
                                                               "outputUrl":"rtmp://localhost:1935/live/out",
                                                               "pixelFormat": 0,
                                                               "videoCodec": "h264",
                                                               "duration": 12000}' & sleep 0.1 & ffplay rtmp://localhost:1935/live/out/P240p30fps16x9/0.ts

Pros and Cons

Pros:

this relatively light refactoring allows to simplify architecture and achieves low latency

Cons:

we rely on Ffmpeg builtin streaming protocols
no transport-level encryption for RTMP/HTTP media streams (but maybe Ffmpeg supports that?)
needs additional features on Mist side: a. create and manage HLS 'input' endpoint for Transcoder b. create and manage RTSP 'output' endpoints for Transcoder c. allow B to download renditions for verification (or can we move verification fully on T?) d. pass segment metadata to B, as in cURL request above

Further steps

To explore this approach further, we should decide which specific protocol features need to be supported. I'm not sure how exactly Clients <> Mist-B-O-T currently interact on production, but it makes sense to replicate what's used on production first, and, maybe, just delete all historical features, which no one uses (e.g. do we need to support HLS playlists at all, if Mist handles streaming?). Updates to Mist will probably need to come first.

Conclusion

In my opinion - and, hopefully, I'm not missing any critical pieces - this approach would not only allow to get 'low' latency, but also would allow to simplify Go code, making it easier to maintain and improve.

PS: this is PoC and is not supposed to be merged without revisiting all changes.

May 18 '22 14:05 cyberj0g

Maybe Jaron could comment on most suitable protocols for Mist <> T (Ffmpeg) interactions and what it would take to implement? His remark inspired me to experiment with direct communication and I'm not too knowledgeable on video streaming. @Thulinma

May 19 '22 07:05 cyberj0g

We chosen gRPC because it is already used in B<>O<>T. In direct Mist<>T scenario we can reconsider using websocket.

May 19 '22 08:05 AlexKordic

@AlexKordic I think the main question is how to stream segments to and from Mist. To keep changes to minimum, we need a secure protocol which a) already supported by Mist and Ffmpeg b) support all video formats we'd like to use. My test uses HTTP and RTMP, neither have TLS, and RTMP doesn't support HEVC. Ffmpeg have support for HTTPS and SRT, but that's not tested, and Mist seem to not support HTTPS (or docs are outdated). Another question is how to transfer side data requested by the user, like machine learning models output. This is currently included into frame metadata and transferred in the video stream, maybe it's not a bad option. How to transfer 'global' stream-level metadata, if we'll have it?

May 19 '22 08:05 cyberj0g

Really cool PoC, thanks @cyberj0g!

Initial thoughts -

Agree that we can definitely simplify our current code now that Mist is a permanent part of the stack, by removing manifest generation, RTMP input etc. and that's on the roadmap once we have a stable v1 release of Catalyst
I don't think we can guarantee that either Mist or the Transcoder is publically routable at the moment, which means not being able to have them stream to each other. From talking to the O operators as part of the opt-in metrics work, there's a strong push from them not to expose their T nodes and routing details to the internet.
I'm still not clear on how payments work in this model, are we relying on the Transcoder self-reporting to the Orchestrator and Broadcaster?
Similarly, even if we could have the Broadcaster request / the Transcoder push every X segments for verification it seems like we'd still be introducing an element of trust

May 19 '22 11:05 thomshutt

@thomshutt

Ts are not actually exposed with direct model, they just need to have internet access to reach B (Mist) and pull the stream / push results. I think, it's not an issue, if stream URLs are secure and unique (using stream key, like YouTube or Twitch), and the protocol has transport-level security. There's no requirement for T to have externally accessible internet address, and Bs addresses are not secret. We aim to have B and Mist collocated. In current version, Mist already can run livepeer and push the stream there, we just need to extend that logic
Verification and payments are controlled by B. It already needs to download the segment from O for verification, with direct model, it will do exactly the same, but using Mist output URLs, where T just streamed transcoded results. Mist can't store pushed segments currently (I think), so this part is not tested.

May 19 '22 12:05 cyberj0g

Thanks for putting this together!

IIUC the workflow is as follows:

M sends a message to B as soon as the first bits of the data is available with the source URL in the request
B uses metadata (could be from M) to run O discovery and selection
B sends a message to O that contains M's source URL
O uses metadata (could be from B) runs T discovery and selection [1]
O sends a message to T that contains M's source URL
T uses the source URL to immediately start pulling the data to transcode and pushes the rendition data directly back to M. This portion is a fully streaming workflow b/w M & T
T can simultaneously pass a message back to O with the rendition output URLs
O can use the rendition output URLs to download the data from T
B can use the rendition output URLs to download the data from M

Seems that there would need to some sort of feedback mechanism b/w M & B so B can learn about T failures from M as well as b/w O & T so O can also learn about T failures.

[1] This step is unnecessary if O is also a T.

A few initial questions:

Ideally, rendition data can be signed using an O's ETH private key. This signature authenticates O based on its ETH address and can also be presented in other contexts (which wouldn't be very straightforward to do by just relying on TLS) as evidence that a specific O returned specific rendition data to M/B. While we don't have stake slashing (i.e. economically penalize the O if there is evidence of malicious behavior) enabled in the on-chain protocol today, there are plans to consider enabling slashing in the future and M/B receiving signatures from O would be required. If M is receiving data directly from T how could O's signatures be made accessible to M/B?
Ideally, source data can be signed using a B's ETH private key. This is related to the previous point - the signature authenticates B and we would also like to have cryptographic evidence that O produced certain rendition and that the rendition data was associated with specific source data sent by B which we also have cryptographic evidence for. In the current workflow, since B receives the full source data it can hash it and sign it and then send the signature to O. If T is pulling data directly from M how could B's signatures be made accessible to O?
Ideally, if T is performing poorly or stops responding, its O can re-route the stream to a different T before M/B decides to switch away from O. Otherwise, a single poorly performing or non-responsive T would cause O to be dropped by the M/B. How would we support this with a direct M <> T workflow?

May 20 '22 01:05 yondonfu

Thanks for comments @yondonfu.

Seems that there would need to some sort of feedback mechanism b/w M & B so B can learn about T failures from M as well as b/w O & T so O can also learn about T failures.

The transcoding request from M to B is not async, so in case T encounters any errors or timeouts, they will propagate to O and B in a normal way through gRPC, and can also propagate to M through B response e.g. it can delete the segment if verification fails or undesirable content is detected. Mid-segment failover should also be supported by M - it should allow to overwrite segment which is already being received and maybe drop previous T's connection in this case. The proposed workflow is optimistic in terms of that validation happens after segment already transcoded and available to user. To move away from that, while achieving low latency, we would need to make all verification checks streaming as well.

If M is receiving data directly from T how could O's signatures be made accessible to M/B?

Is it necessary to sign the rendition data for that? If M's rendition output URLs would be unique for each segment, could we have O sign this URLs with its private key, and then use this signature as an evidence that O either pushed rendition to that URL or disclosed it to third party? TLS should make the URLs immune to MitM attack.

If T is pulling data directly from M how could B's signatures be made accessible to O?

We could extend above to per-segment 'transcoding ticket' which contains unique and secret input and rendition URLs. It can be signed by B on the way to T, and by O and T on the way back.

a single poorly performing or non-responsive T would cause O to be dropped by the M/B. How would we support this with a direct M <> T workflow?

This logic is intact, O can switch to different T within the B's timeout and it will start pushing data to M. Above signature verification logic (if it will work on T level too) need to account for the case when T is switched mid-segment, because secret URLs of M won't be secret anymore.

May 20 '22 06:05 cyberj0g