burn
burn copied to clipboard
StableHLO
Have you tought to target a curate StableHLO operators set? https://github.com/openxla/stablehlo
Historically one of the main issue about Deep learning frameworks was the over the tome combinatorial explosion of Ops with a pain to optimize and without a compositional approach.
This happened also in Pytorch with their PrimTorch initiative related to its compiler stack.
PrimTorch canonicalizes ~2000+ PyTorch operators down to a closed set of ~250 primitive operators that developers can target to build a complete PyTorch backend. This substantially lowers the barrier of writing a PyTorch feature or backend.
We have the Backend trait that defines which basic operations should be implemented by every backends. The trait does provide a default implementation when operations can be combined together to create a new one, but also let backends override the default implementation to potentially increase performance.
XLA could become a backend for Burn, though it doesn't seems to support an eager API, and we currently don't have a graph based Backend API.
If you're inquiring about the compatibility of Burn with the StableHLO serialization, then Burn is well equipped for the task. At present, Burn facilitates the importation of ONNX models, encompassing both model structure and weights. You can find further details at Burn's ONNX import documentation.
When dealing with ONNX, we engineered burn-import to utilize two distinct intermediate representations: the ONNX model and the Burn Graph Model. The conversion process from the Burn Graph to Rust code remains decoupled from the originating representation, whether that's ONNX or StableHLO. For a comprehensive understanding of our design philosophy, you might want to explore the high-level design notes available on our repository.
It could be interesting to take a look at the approach in: https://github.com/openxla/openxla-pjrt-plugin
I came across this issue trying to find out what would be involved in creating an XLA backend for burn in order to be able to run inference and training on Google Cloud TPUs. This "high-level design notes" link above seems like it might have been informative, but it just 404s now. Where should I look for an up-to-date equivalent of that doc?
That document was moved to the contributor book, but I don't think it would be relevant for that context 😅
I see. Yeah, importing ONNX models isn't really my interest here. Re this...
XLA could become a backend for Burn, though it doesn't seems to support an eager API, and we currently don't have a graph based Backend API.
I'm sure I'm showing my ignorance here, but is there a reason an XLA backend couldn't simply "eagerly" create graph nodes in the background for most operations, and only actually evaluate that graph when Tensor::to_data or similar functions are called? I can vaguely see that there could be issues with computations not occurring when you expect with this approach, but are there bigger show-stoppers I'm not seeing? What is the biggest road block for using a graph-based API in the implementation of an eager API?
@aiguy110 to target TPUs we would probably need to use LLVM-MLIR in CubeCL, we already have an implementation that targets CPUs.
Thanks @nathanielsimard . That approach sounds like it would side-step the whole "eager" vs "graph" issue, does that sound accurate to you?
I'm honestly probably not in a position to contribute anything of this magnitude, but I appreciate the guidance! I'll take a look at the CubeCL CPU implementation and LLVM-MLIR and see if there is anything I can do. Thanks again!