grants icon indicating copy to clipboard operation
grants copied to clipboard

Wish List: Infrastructure management tools

Open dob opened this issue 5 years ago • 10 comments
trafficstars

The Problem Orchestrators/Transcoders currently have the base layer functionality available via the Livepeer node to set up an Orchestrator/Transcoder setup, but this falls far short of an "easy to manage" toolset to give them visibility and management capability over their cluster.

Potential Solutions Open source tools that let infrastructure operators easily view the status of their various Orchestrator and Transcoder processes, let them add and remove transcoders from their cluster, let them adjust pricing on the fly, let them get alerts when things go down or there are required protocol interactions pending, etc would be hugely helpful to those running infrastructure on the Livepeer network.

This probably combines writing some wrapper code on the raw functionality of the Livepeer node software, along with creating an interface for visibility and management of the infrastructure.

Challenges All the APIs exist to do this, either in Livepeer world through the node APIs and smart contract interactions, or through publicly available infrastructure management tools like Docker, Kubernetes, etc...but the devops work of piecing them together, scripting them, and creating interfaces on top can be a challenge.

Summary If people propose building open source tools that are valuable to the infrastructure operator community here, this is an area that we would love to support with grants.

dob avatar Feb 13 '20 16:02 dob

Hi @dob ! Thank you for articulating the issue so well.

I am interested in this project and I think I can develop something around here. Before moving further, I want to feel the whole scenario and see what I can do.

I have been in DevOps for years, health-check infrastructure is my major interest. Have worked on a few grants before as well.

But I am new to LPT. I just ran a local orchestrator and a transcoder and did some basic stuff like checking their status and all. I want to reproduce the complete scenario - clusters of orchestrators and transcoders running and then I go and check the status of various processes, adding/removing transcoders, (MAYBE) adjusting prices on the fly, and setting up health alerts for the cluster, etc

Is there any testing cluster that I can use?

kebab-mai-haddi avatar Apr 04 '21 02:04 kebab-mai-haddi

I'm guessing there's not a test cluster set up for this, but I bet there's some info in the way of blueprints you could follow if you were setting up your own cluster. Paging @iameli who may be able to point you in the right direction.

dob avatar Apr 08 '21 21:04 dob

Hi @kebab-mai-haddi! Awesome that you're interested. A lot has changed infrastructurally for Livepeer in the year+ since this proposal was written. Here are two big pieces of infrastructure that could potentially be interesting to you:

  1. We're looking to release an up-to-date version of our Kubernetes Helm chart soon - I can probably do that now, actually, though it's not terribly well-documented. This is the main thing that backs most of the Livepeer.com infrastructure right now, and it's capable of running on-chain broadcasters, orchestrators, and even the API node backing the REST API for defining streams and whatnot. It's a logical starting point for any kind of "scaled Livepeer deployment tools" project. But most of its cool features are tailored toward those running scaled broadcasters rather than orchestrators.

  2. There's also the monitoring supercontainer, which is a combined Prometheus/Grafana Docker container capable of monitoring and delivering statistics. Very useful for running Os and checking on ticket redemptions and that sort of thing.

I think both of these are pieces of the puzzle but they don't necessarily answer all of @dob's original post:

Open source tools that let infrastructure operators easily view the status of their various Orchestrator and Transcoder processes, let them add and remove transcoders from their cluster, let them adjust pricing on the fly, let them get alerts when things go down or there are required protocol interactions pending, etc

So I think there's still definitely interest for that sort of thing, especially in the "setting prices on lots of Os at once" area. I'd be curious to get your take on that.

We don't presently have a test cluster set up for this, but depending on what'd be necessary for your particular grant proposal we can discuss how we could help in that area. (A proper Livepeer test cluster is an interesting idea, though — I'm a big fan of the CNCF's Community Infrastructure Lab.)

iameli avatar Apr 09 '21 03:04 iameli

Thank you for your response guys! @iameli , I am interested in all the three project ideas that you have mentioned and I will go through them and revert to you guys asap!

kebab-mai-haddi avatar Apr 13 '21 02:04 kebab-mai-haddi

This issue has been marked as stale with no activity. It will close in 7 days.

github-actions[bot] avatar Jul 28 '23 01:07 github-actions[bot]

Hi @dob, is this path too radical for Livepeer to support via grant for infra. mgmt.?

Create video as a separate work type inside backend.ai, and Livepeer, within it, as a web configurable work load - https://backend.ai/ https://github.com/lablup/backend.ai/

If you think it may be worthwhile to explore, I will reach out to their developers?

Strykar avatar Oct 25 '23 03:10 Strykar

Hi @dob, is this path too radical for Livepeer to support via grant for infra. mgmt.?

Create video as a separate work type inside backend.ai, and Livepeer, within it, as a web configurable work load - https://backend.ai/ https://github.com/lablup/backend.ai/

If you think it may be worthwhile to explore, I will reach out to their developers?

I was chatting with Stykar about backend.ai and I'd love to hear feedback from more O's and the core team on whether or not it's worth pursuing.

AuthorityNull avatar Oct 25 '23 03:10 AuthorityNull

@Strykar Could you elaborate a little bit more on the workflow that you envision for a backend.ai user? What type of task would they be looking to perform, what would be their interface to performing it, and how would the Livepeer network plug in? Thanks!

dob avatar Oct 25 '23 16:10 dob

Sure @dob, here's a possible list of features an Orchestrator web interface for Livepeer may be expected to have over time:

  • Knobs and dials to tweak Orchestrator settings on the fly
  • HA, fail over and self remediation for the OS, environment and software updates
  • User acc. mgmt. + create different dashboards for logged in delegators, multi-tenancy
  • BI dashboards / Grafana / Dune / Orch sourced charts

If we use an existing OSS AI/ML GPU hyperscaler orchestration project like Backend.ai, we would not have to deal with reinventing the wheel / maintenance of any of the above features.

This will require two prior Public Goods grants -

  1. Enable an API for all Orchestrator functions exposed via livepeer / livepeer-gpu and their configs
  2. Enable livepeer to live-reload its own configuration via -HUP without dropping streams

This grant then could:

  1. Add livepeer / livepeer-gpu as new container images that can be spun up in a pre-configured environment and system administered via Backend.ai's existing framework.
  2. Create a livepeer Orchestrator specific Grafana dashboard with multiple inputs from Dune or other O's

This relatively small effort would enable all of the features listed above with zero maintenance burden on the grant.

All the features above are now available to any Orchestrator irrespective of size, in something they can install on Docker Desktop on their home PC or bare metal in data centers.

Possible future benefits:

  • Backend.ai has extensive Business Intelligence dashboards, O’s can build on these to help them with information to quantize things like Price Per Pixel.
  • Over time future grants could flesh out a video specific payloads, and its work orchestration enabling O’s to provide Video / AI / ML services over Web2 if they wish to, incentivizing Orchestrator’s even during low work periods on Livepeer.
  • Account integration could provide the base to to develop a proper public pool operator software.

These short videos give a sense of the features this grant would enable - https://www.backend.ai/product/webui https://www.backend.ai/product/control-panel https://www.backend.ai/product/dashboard

Or do you feel an Orchestrator management project is better off developed centered around Livepeer and a custom one-off?

Strykar avatar Oct 27 '23 08:10 Strykar

It's a little abstract to me, due to not being in the weeds of node operation. I think the meaningful signal here would be if O's actually wanted and saw the benefit in this. Any O's care to chime in with some specific examples of how this would concretely help your day-to-day?

dob avatar Nov 09 '23 21:11 dob