Matt McClean
Matt McClean
Any plans to support [Inferentia](https://aws.amazon.com/machine-learning/inferentia/) and [Trainium](https://aws.amazon.com/machine-learning/trainium/) based instances ? They expose the accelerators via PCI to the OS but I see PCI support is not planned for firecracker. See...
Thanks. Does that mean that PCI passthrough should work for alternative devices (e.g. Trainium and Inferentia) that expose themselves in `/dev` ?
https://pytorch.org/blog/accelerated-stable-diffusion-2/
Any updates on this @gante ?
Yes, ideally it should be a parameter that can be added to the config yaml file
Here is an example of how a `trn1.32xlarge` needs to be setup for multi-instance based training with EFA - https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-neuronx/setup-trn1-multi-node-execution.html