ONNX.jl Addressing ONNX.jl to Flux.jl graphs

Hello!

As you know I'm trying to help address the first major step for development which is adding the missing operators to this package. However, I know this package has a second goal that needs addressing.

As I work on this I want to know if there is anything I should be doing in parallel to help progress our ability to create Flux models from the Umlaut tapes that are read with this package. Can I get code in the conversions for operators we have or should we focus more on the operators in totality first?

Best, Duncan

Feb 18 '25 18:02 dstarkenburg

Hi there! I don't think we need to implement all operators first. In fact, I believe ~20-30% of operators will be enough to onboard ~90% of modern ML models, so I'd be pragmatic here and do things that push your current goals the most. If you want to create Flux models from ONNX/Umlaut tapes, then it's great idea to invest into it.

However, don't expect it to be an easy task! Flux is a high-level framework that operates on high-level objects like layers. The mapping from Flux models to primitive graphs (ONNX, Umlaut tapes, etc.) is always unique, but the opposite mapping is not. Consider the following piece of graph, for example:

%5 = %2 * %3      # matrix-matrix multiplication
%6 = %5 .+ %4     # elementwise addition

This looks like a Dense() layer with weight matrix %2 and bias %4, but may also be a Dense layer and a separately added vector (e.g. residual layer) or even something totally different like part of dot-product attention.

I'd start with writing down a few ONNX/Umlaut graphs and corresponding Flux models, and inspecting them piece by piece. Is their a clear pattern of mapping? Are their frequent sequences of operators in graphs that we can detect? What if we already have an ML model and only need to map data?

Depending on these observations, we can decide whether we want to create a pattern matching mechanism that builds Flux models, or we want to generate code of Flux models apriori (e.g. using LLMs) and then map only data, or we even need to re-think Flux layer approach to reflect graph structure better.

Feb 18 '25 22:02 dfdx

In case you are interested in a peek on what those pattern matching mechnisms might look like, https://github.com/DrChainsaw/ONNXNaiveNASflux.jl translates ONNX ops to Flux layers. It uses a non-Flux way to connect the nodes in the computation graph though (so it does not return a Chain).

Feb 26 '25 13:02 DrChainsaw