bottlerocket
bottlerocket copied to clipboard
Add support for fabric manager
What I'd like: Support for fabric manager, as described in: https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf
Any alternatives you've considered: N / A
This is needed to work on p4d24xlarge instances types, is their a Bottlerocket Variant with this package ?
Are there any updates?
Since the fabric manager is not included, it appears that the GPU cannot be used when creating p4* and p5* instances based on bottlerocket.
Could you guide how to utilize GPU when creating p4*, p5* instances with bottlerocket OS?
Unfortunately without fabric manager support Bottlerocket can't utilize the GPUs of the p4/p5 instance types.
Just ran into this one too, we recently moved to Bottlerocket and have today spun up a p4d instance type only to find that we are unable to use the GPUs due to the missing fabric manager.
We just ran into this issue with a p4d.24xlarge
. ~It's unclear to me though if we were to run the non-nvidia based bottlerocket AMI and then run the Nvdia GPU Operator on those nodes, would that work?~ Ah, I don't think the operator supports BottleRocket-OS.
@elatt, correct. The GPU Operator doesn't support Bottleocket. The team is currently working on adding support for fabric manager, but it is still unclear when will this land. We will provide updates to this issue as we make progress :+1: .
I believe this is resolved with #3873 - cc @monirul
@bryantbiggs Yes, #3873 resolves this issue and will be part of future release.
Changes are merged. Marking this issue as completed.