runbooks icon indicating copy to clipboard operation
runbooks copied to clipboard

Allow specify GPU count only without GPU type

Open samos123 opened this issue 2 years ago • 5 comments

In local clusters like kind or clusters that all the same GPU type it wouldn't make sense to have to specify the GPU type. The GPU type should be an optional parameter that can be left out which would mean the pod can go to any GPU node.

samos123 avatar Aug 01 '23 04:08 samos123

I think this would be better implemented as configurable gpu-types. Currently GPU types are hardcoded here:

https://github.com/substratusai/substratus/blob/main/internal/resources/gpu_info.go

This mapping could be pulled from a ConfigMap in the substratus namespace instead.

kind: ConfigMap
metadata:
  name: gpu-types
  namespace: substratus
data:
  nvidia-l4: "{ ... info here ...}"
  # ...

nstogner avatar Aug 01 '23 12:08 nstogner

I think there is a valid use case for not having to specify any gpu type. Let's say you are fine with running on H100 or A100 then just setting GPU count would let you schedule on any machine with available GPUs. This isn't just relevant for local cluster.

samos123 avatar Aug 05 '23 08:08 samos123

If users only specify GPU count and not type they will be under specifying the memory required (which is implied in gpu-type today). Perhaps we allow users to EITHER specify GPU count & type OR GPU memory.

nstogner avatar Aug 08 '23 13:08 nstogner

I implemented this for kind and we should probably allow this for GCP as well. Your thoughts? @nstogner

samos123 avatar Aug 25 '23 06:08 samos123

Let me know what your thoughts are on my previous comment.

nstogner avatar Aug 29 '23 11:08 nstogner