go-livepeer-basicnet icon indicating copy to clipboard operation
go-livepeer-basicnet copied to clipboard

Broadcaster Initiated Connections

Open j0sh opened this issue 6 years ago • 3 comments

Rationale

Handling failure cases in the current implementation is difficult (eg, transcoder address changes, broadcaster unavailability), and poses challenges for designing large-scale systems.

Broadcasters are overly exposed to the network (their Node ID is public), while transcoders could be exposed more given their role as providers of infrastructure. Reverse this dynamic.

This proposal solves these issues simultaneously. The specific challenges in the current network protocol are elaborated below, in the context of the proposal's benefits.

Proposal

  • Publish transcoder network information on-chain (or in a similarly verifiable location). This could be the node ID, the address, etc.
  • The broadcaster initiates the messaging to the transcoder when it's ready to stream a job. Eg, turn the flow into this:
    • Broadcaster sends TranscodeReq(JobInfo) request
    • Transcoder sends a TranscodeAck(ConnectionInfo) response
    • Broadcaster establishes a direct connection using ConnectionInfo
    • Broadcaster streams segments to transcoder
  • Remove the broadcaster network information from the chain (StreamID).

For reference to the current transcoder behavior, see the proposal here https://github.com/livepeer/go-livepeer-basicnet/issues/21#issuecomment-369310870 . In summary:

  • Old flow Transcoder initiated connection, where the transcoder sends TranscodeSub to the broadcaster.
  • Proposed flow Broadcaster initiated connection, where the broadcaster sends TranscodeReq to the transcoder.

Benefits, as related to the role of the transcoder

Given the limited pool size, transcoder operators are likely to run multiple physical nodes to accommodate higher demand. A given transcoder Eth address could correspond to any number of nodes.

  • Load balancing. Upon starting a broadcast, the transcoder can direct the broadcaster to connect to a particular node when the broadcaster is ready, rather than deciding that node in advance when the job is created. This would also make it much easier to update the assigned node mid-job (read below for details).
  • Failover. If the connection dies, the broadcaster can re-request a new address, and re-establish the connection. This also greatly simplifies the systems architecture for a transcoder operator: the operator doesn't have to keep track of each node's state up-to-the-moment. If a node fails, the transcoder doesn't have to do anything to re-create the node's connection state; it can simply wait for broadcasters to send a request in again, and direct them around the failed node.

Benefits, as related to the role of the broadcaster

The broadcaster knows exactly when it's going to need a transcoder. Let the broadcaster drive that; take the onus of initiating the job off the transcoder.

  • Works better with our expectations of broadcaster uptime.
    • Transcoders are expected to be continually available. Not so for broadcasters. Broadcasters aren't providing infrastructure, while transcoder operators are.
  • The broadcaster does not need to be online or maintain an active connection to the transcoder.
    • Solves the question of what to do if a direct connection breaks (eg, broadcaster takes a break) or the broadcaster goes offline. The transcoder does not need to do anything. Avoids network spam in case the transcoder needs to publish updated information if the broadcaster is offline (read below).
  • Following the expectation of broadcaster uptime: TranscodeSub, as specified, will lead to periodic spamming on the network if a broadcaster isn't ready to stream a job (eg, isn't online immediately after a job is created), or if the transcoder's connection information has changed.
    • In particular, extremely long job durations impose an expensive externality on the network: we could be flooded with unacknowledged requests for thousands of dead-end transcoding jobs. Since these broadcasters could come online or acknowledge the transcoder (via direct connection) at any moment, we can't simply stop sending these TranscodeSub messages.
  • With that being said on broadcaster availability, this would also be a better fit for delayed broadcasting https://github.com/livepeer/go-livepeer/issues/316
  • The broadcaster node ID no longer needs to be public, and the broadcaster doesn't need to be exposed online waiting for messages from the transcoder.
    • This is a good thing! Gives broadcasters additional flexibility and reduces their operational / security burden. All they need to carry around are their Eth signing keys, rather than some abstract notion of node ID. This makes broadcaster 'portability' much easier. We also expect transcoders to be much more engaged with the mechanics of running the network, since they are providing the infrastructure. The broadcaster doesn't need to know that burden. Related: https://github.com/livepeer/go-livepeer-basicnet/issues/31

Additional Future Potential

  • Segues better into an off-chain transcoder selection mechanism. For example, we could submit a set of encoding specifications (in terms of our 'gas accounting' units), and transcoders could respond with a price. The broadcaster can then choose which transcoder to initiate a direct connection to. With the current TranscodeSub mechanism, this would require another set of round-trips to ack the job.
  • Beginnnings of a transcoder availability mechanism. The broadcaster could try another transcoder if the transcoder NACKs or is otherwise unresponsive. This solves the problem inherent to the broadcaster losing the gas they spent to submit the job if the assigned transcoder is unavailable.

j0sh avatar Mar 19 '18 23:03 j0sh

Nice proposal Josh. Thanks for the writeup.

I think one of the initial thoughts around attempting to hide transcoder addresses, was that because they need to persist as they are constantly running jobs for multiple parties, it is better to hide them so that they aren't susceptible to spam and ddos. But I don't know if this is actually the case, as if the address is exposed to peers who are broadcasting anyway, then the address is revealed and can be published and ddos'ed.

I like the benefit that you mention of a transcoder being able to load balance across many nodes by providing different connection information.

With regards to redundancy and failure states, @f1l1b0x proposed the property that the network should expect failures and have resiliency built in. His suggestion was that you actually have n (5?) transcoders encoding each segment, and use HLS "backup segments" in the playlist such that the players know how to request a segment from a backup source if the original source isn't serving the segment in time. (Separate issue than what you're proposing, but wanted to bring it up because it's a different way of thinking about things).

I still like the eventual goal of the network topology and decentralized routing to be that any node on the network can request content from "the network", and the routing scheme will route the request towards its source like chord or kademlia. This proposal seems to make sense for now though with our direct connections between transcoder and broadcaster, which is likely the more efficient (if not resilient) setup. If we did have multiple transcoders, but the source didn't have enough bandwidth to serve to all of them on direct connections, we'd have to introduce the relay or p2p based delivery scheme in here somewhere.

dob avatar Mar 20 '18 22:03 dob

the network should expect failures and have resiliency built in. His suggestion was that you actually have n (5?) transcoders encoding each segment

Curious how the incentives for that would work. Sounds like a good topic for another discussion.

If we did have multiple transcoders, but the source didn't have enough bandwidth to serve to all of them on direct connections

Yeah, this brings up a lot of questions, which aren't necessarily related to this specific issue.

  • In theory multiple transcoders would help us to be 'more decentralized' by giving the little guy a chance to encode a profile or two, but raises the question of coordination and overhead. Whereas a single transcoder operator can internally decide how to allocate that work, and the coordination and bookkeeping is less onerous. This is kind of punting on the problem to the transcoders though. I also suspect this is something we'll have to coordinate off-chain to avoid gas costs, otherwise there's no incentive to use multiple transcoders.

  • The broadcaster -> transcoder and relay -> subscriber flows are distinct; the former is is push, while the latter is pull. I'm not sure yet if we really want to continue mixing the two, but it should be straightforward to designate a 'broadcast proxy' for the purpose. Another option, as @f1l1b0x has mentioned, is an object store like IPFS or Swarm.

As-is, this proposal is really meant to accommodate the current blockchain protocol with a minimal number of changes (none, I think), while potentially giving us some escape hatches for the future.

I still like the eventual goal of the network topology and decentralized routing to be that any node on the network can request content from "the network", and the routing scheme will route the request towards its source like chord or kademlia.

DHT-style lookups can be useful for finding a relay node. I do wonder if we can also factor in other metrics, such as ping time/distance, relay load, etc. Not sure if we actually want to be relaying content using the same route though; this is the difference between providing the "discovery service" and actually performing the "relay service". Once a relay is found, the subscriber should be pulling directly from it. Otherwise, we add another hop to the media route, and burden upstream peers with potentially unbounded number of subscriptions from downstream peers.

j0sh avatar Mar 20 '18 23:03 j0sh

Yes, definitely a lot of problems to be solved around coordination and orchestration if we spread out the transcoding.

In general it seems like the eventual strategy for p2p content delivery should be that the DHT style routing are reserved for finding the tracker, who performs the coordination through the DHT. But then when the tracker is passed to a subscriber, it forms another p2p overlay network for the torrent-style content delivery. Again...a little further off though, let's stay focused on the short term broadcaster initiated connections.

dob avatar Mar 21 '18 15:03 dob