extensions icon indicating copy to clipboard operation
extensions copied to clipboard

[Extension Request] Extension for Edge Machine Learning Accelerators

Open Zach-Pate opened this issue 9 months ago • 12 comments

Hello!

I'll start off by thanking you guys for the amazing software you all have created. It has completely changed the game for me!

Stated directly, I am requesting an added extension to support Coral TPU modules. I am currently working on a distributed computing research project and would like to integrate specialized edge ML accelerators into my cluster. My project involves running ML training and inference workloads across multiple nodes, and leveraging these accelerators would significantly improve performance while keeping the cost relatively low.

What would be needed to make something like this possible?

Thank you!

Zach-Pate avatar Mar 20 '25 04:03 Zach-Pate

We can't say that we have experience with Coral TPUs ourselves, but most usually this involves two pieces:

  • kernel driver support
  • some form of container runtime support if you want to share a TPU on a single machine across multiple workloads

Kernel support might come in two flavors:

  • Linux upstream already supports it, so we enable the drivers in the configuration to be built as modules and re-package them in this repo as extensions (e.g. AMD GPU)
  • out-of-tree kernel module should be built (e.g. NVIDIA drivers), and then same they are repackaged in this repo.

Either way, the actual driver build should go to pkgs repo, and container runtime probably here.

smira avatar Mar 20 '25 08:03 smira

Ok! That seems like something I'm interested in looking deeper into. How doable does that seem?

Zach-Pate avatar Mar 20 '25 14:03 Zach-Pate

How doable does that seem?

I don't have answer to this question.

smira avatar Mar 20 '25 14:03 smira

Ok, no worries! Who should I talk to to get something like this built?

Zach-Pate avatar Mar 20 '25 14:03 Zach-Pate

Who should I talk to to get something like this built?

You have three options:

  • build it yourself
  • wait for someone from the community to build it
  • reach out to Sidero Labs and contract us to have this implemented

smira avatar Mar 20 '25 14:03 smira

I was speaking with a representative about the third option earlier this week. I asked about an estimated price for sponsoring a project, but they felt like they didn't have enough technical knowledge to provide one. Do you have an idea of what range it could possibly be in?

Zach-Pate avatar Mar 20 '25 14:03 Zach-Pate

I think you need to reach out to that representative.

smira avatar Mar 20 '25 16:03 smira

Ok, thank you for your help!

Zach-Pate avatar Mar 20 '25 16:03 Zach-Pate

Wouldn't the gasket drivers work for that?

especially-relative avatar Mar 24 '25 03:03 especially-relative

That's a good question actually! I'm not sure. What would be your recommendations for how to implement them?

Zach-Pate avatar Mar 24 '25 13:03 Zach-Pate

https://github.com/siderolabs/extensions/tree/main?tab=readme-ov-file#drivers

gasket ghcr.io/siderolabs/gasket-driver Driver for Google Coral PCIe devices gasket driver upstream short commit-talos version

Those might be what you need, or could be a good starting point if the usb version needs a similar driver

especially-relative avatar Mar 24 '25 14:03 especially-relative

Perfect, thank you!

Zach-Pate avatar Mar 27 '25 18:03 Zach-Pate

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Sep 24 '25 02:09 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Sep 29 '25 02:09 github-actions[bot]