dra-example-driver icon indicating copy to clipboard operation
dra-example-driver copied to clipboard

demonstrating advanced features

Open pohly opened this issue 9 months ago • 9 comments

We want to demonstrate advanced features like partitionable devices with the example driver. This begs the question: should we use some artificial, fake devices or mirror the behavior of some real vendor driver when it advertises real hardware?

On the one hand, we prefer to stay vendor-neutral. On the other hand, a real example is easier to understand and would make the example driver more useful for realistic benchmarking.

After some discussions at KubeCon and in https://kubernetes.slack.com/archives/C0409NGC1TK/p1743667181489199, here's a proposal. In the top-level README.md, we add a new section:

Configuration

Vendors are encouraged to work with the Kubernetes maintainers to enhance DRA for their use cases. When this leads to new features, extending the example driver such that it demonstrates those features by emulating a vendor driver for certain hardware is desirable. Later, adding novel usages of existing features may also be worth extending the example driver.

At the moment, the driver supports the following profiles:

  • gpu (default): 8 generic GPUs per node. Works on Kubernetes >= 1.32.
  • nvidia-mig: two NVIDIA A100 GPUs per node, with attributes that are the same as for real hardware. Works on Kubernetes >= 1.33.
  • google-tpu: models multi-host devices. Works on Kubernetes >= 1.33.

Each deployment of the example driver uses exactly one profile and <profile name>.dra.example.com as driver name. To configure the profile, ... [TBD]. These profiles do not actually emulate any real hardware. Instead, they merely inject environment variables which mirror the devices that were allocated.

/assign @bg-chun

pohly avatar Apr 07 '25 13:04 pohly

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 06 '25 14:07 k8s-triage-robot

@bg-chun: are you still interested in working on this?

pohly avatar Jul 06 '25 15:07 pohly

/remove lifecycle-state

pohly avatar Jul 06 '25 15:07 pohly

/remove-lifecycle stale

pohly avatar Jul 06 '25 15:07 pohly

@bg-chun If you're not already too deep into this, I'd like to get this started. At least as far as refactoring to introduce the concept of a "profile" with only the "gpu" one implemented.

nojnhuh avatar Aug 20 '25 21:08 nojnhuh

@nojnhuh Yes, go ahead. I’m currently focused on a few internal works that will be released and open-sourced soon, so it’s a little tough to make extra time right now. However, I’ll try to take a look and keep track of your PRs.

bg-chun avatar Aug 20 '25 21:08 bg-chun

Sounds good, thanks!

/assign

nojnhuh avatar Aug 20 '25 21:08 nojnhuh

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 18 '25 21:11 k8s-triage-robot

Actively working on this!

/remove-lifecycle stale

nojnhuh avatar Nov 18 '25 21:11 nojnhuh