leapfrogai icon indicating copy to clipboard operation
leapfrogai copied to clipboard

spike: NVIDIA GPU operator Zarf package

Open justinthelaw opened this issue 1 year ago • 2 comments

LFAI delivery requires a production-ready NVIDIA GPU operator Zarf package that will bootstrap a containerized version of the necessary NVIDIA CUDA drivers, container toolkit, feature discovery and device plugin components to enable generative AI and ML applications to use NVIDIA GPUs from a Kubernetes cluster.

  • [ ] How do I prepare an air-gappable Zarf package that contains the NVIDIA GPU operator?
  • [ ] How do I setup the NVIDIA GPU operator to be configurable at deploy time?
    • [ ] Multi-instance GPU (logical separation of GPU resources)?
    • [ ] Time slicing (shared GPU loading and usage)?
    • [ ] Distributed node resource load balancing configuration?
  • [ ] How and where do I consistently test this on K3D to make sure it works?
  • [ ] How and where do I consistently test this on RKE2 to make sure it works?
  • [ ] How do I integrate this back into the LFAI infrastructure UDS bundle in issue #317

See additional NVIDIA GPU operator context here: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html

justinthelaw avatar Mar 26 '24 14:03 justinthelaw

Some Defense Unicorns related resources:

  • https://github.com/justinthelaw/k3d-gpu-support
  • https://github.com/defenseunicorns/uds-prod-infrastructure
  • https://github.com/defenseunicorns/zarf-package-k3d-airgap

justinthelaw avatar Mar 26 '24 14:03 justinthelaw

Commenting for personal tracking- Part of this spike should involve evaluating creating our own version of this repo/container that we publish from our org to use.

YrrepNoj avatar Apr 04 '24 18:04 YrrepNoj

This will be tracked via the following PR: https://github.com/justinthelaw/uds-rke2/pull/39

justinthelaw avatar Jun 18 '24 16:06 justinthelaw

PR in previous comment is the tracking PR that is tied to a Delivery issue.

justinthelaw avatar Jul 11 '24 18:07 justinthelaw