bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

Add support for fabric manager

Open arnaldo2792 opened this issue 1 year ago • 3 comments

What I'd like: Support for fabric manager, as described in: https://docs.nvidia.com/datacenter/tesla/pdf/fabric-manager-user-guide.pdf

Any alternatives you've considered: N / A

arnaldo2792 avatar Jul 20 '23 18:07 arnaldo2792

This is needed to work on p4d24xlarge instances types, is their a Bottlerocket Variant with this package ?

allamand avatar Nov 08 '23 10:11 allamand

Are there any updates?

Since the fabric manager is not included, it appears that the GPU cannot be used when creating p4* and p5* instances based on bottlerocket.

Could you guide how to utilize GPU when creating p4*, p5* instances with bottlerocket OS?

elanv avatar Dec 07 '23 10:12 elanv

Unfortunately without fabric manager support Bottlerocket can't utilize the GPUs of the p4/p5 instance types.

jpculp avatar Dec 08 '23 21:12 jpculp

Just ran into this one too, we recently moved to Bottlerocket and have today spun up a p4d instance type only to find that we are unable to use the GPUs due to the missing fabric manager.

stefansedich avatar Mar 05 '24 08:03 stefansedich

We just ran into this issue with a p4d.24xlarge. ~It's unclear to me though if we were to run the non-nvidia based bottlerocket AMI and then run the Nvdia GPU Operator on those nodes, would that work?~ Ah, I don't think the operator supports BottleRocket-OS.

elatt avatar Mar 09 '24 20:03 elatt

@elatt, correct. The GPU Operator doesn't support Bottleocket. The team is currently working on adding support for fabric manager, but it is still unclear when will this land. We will provide updates to this issue as we make progress :+1: .

arnaldo2792 avatar Mar 11 '24 16:03 arnaldo2792

I believe this is resolved with #3873 - cc @monirul

bryantbiggs avatar Apr 13 '24 22:04 bryantbiggs

@bryantbiggs Yes, #3873 resolves this issue and will be part of future release.

vyaghras avatar Apr 15 '24 17:04 vyaghras

Changes are merged. Marking this issue as completed.

monirul avatar Apr 15 '24 19:04 monirul