wonnx Support Stable Diffusion model

Is your feature request related to a problem? Please describe. I would like to be able to run Stable Diffusion using wonnx

Describe the solution you'd like At least, these operators are missing and should be implemented before even trying too run Stable Diffusion on wonnx: Einsum, Erf, Expand, InstanceNormalization, Shape, Slice

This is the minimum based on this guide that simplifies the onnx model (see the simplification table): https://www.photoroom.com/tech/stable-diffusion-25-percent-faster-and-save-seconds/

Probably many more things will be needed, but I'm creating this issue because it can be a really interesting use case to be able to run SD in rust on the GPU directly.

I don't have much experience with wonnx or even ML, but I decided to create this issue because it surprised me how few operators are missing to run this model. I would need to get more experience with stable diffusion, diffusers library and onnx in python before attempting to port it here, but maybe there are more experienced users interested too.

Sep 20 '22 09:09 siriux

Hello Sirius, thanks for taking interest in wonnx!

The erf function is not yet a native operation on WGSL, see: https://www.w3.org/TR/WGSL/

It will be required to do an approximation of the erf function, to do stable diffusion on wonnx. I am at this point not sure on how to implement this.

Sep 20 '22 11:09 haixuanTao

Thanks for your answer. Again, let me reiterate my ignorance on this field, but this is what I've found.

The implementation used in tract seems very simple https://github.com/sonos/tract/blob/21928fb3652d028db5be1348e6017494318d4b86/onnx-opl/src/erf.rs

Looking at other WGSL shaders for other operations, it seems translatable.

The signum in WGSL is just sign, abs is the same, powi we can just use pow or even unroll it as it's 16 (and it's short and efficient), recip is just 1/x.

copysign is trickier, but for the erf function should be just a multiplication with the original sign (as erf(0) == 0).

I've looked a little bit to the other missing ops, and they don't seem as straight forward.

Sep 20 '22 12:09 siriux

I looked into this a few weeks ago - it is a significant chunk of work for 2 reasons:

The ops to implement are complicated (i.e Einsum)
WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

Oct 10 '22 21:10 FL33TW00D

Thanks for looking at it. I hope one day we can be able to run something like SD in pure Rust.

Oct 11 '22 06:10 siriux

As a matter of interest, tch-rs recently implemented Stable Diffusion: https://github.com/LaurentMazare/diffusers-rs

It's not directly applicable to this, but it could inform future development efforts.

Nov 13 '22 01:11 philpax

WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

I am not too familiar with SD but at least for BERT and other text encoders, parameterized dimensions can be replaced with fixed dimensions just fine (the model will then work with text token strings up to the statically set length).

Feb 07 '23 22:02 pixelspark

WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

The shape inference engine in WONNX now supports this (it allows you to set parametrized dimensions, then infer shapes for other outputs).

Mar 07 '23 14:03 pixelspark

I looked into this a few weeks ago - it is a significant chunk of work for 2 reasons:

The ops to implement are complicated (i.e Einsum)

WONNX does not currently support parameterized dimensions, which would be required to implement the text encoder.

As for Einsum: this may be feasible, a first start is in #154

Mar 26 '23 21:03 pixelspark

wonnx wonnx copied to clipboard

Support Stable Diffusion model

wonnx
wonnx copied to clipboard