oneflow
oneflow copied to clipboard
[Performance Issue]: The New Update of Oneflow is not optimising enough for RTX 3090, 4090
Brief Description
Hello, We had created the python environment which was going good inference speed for stable diffusion using oneflow diffusers ( 30 it/sec ) on my 3090. The installed oneflow version was oneflow-0.9.1.dev20230212+cu117. And I was using oneflow-fork as diffusers branch. With the new update of oneflow the speed has considerably gone down on 3090 and 4090 but it is same on A100 GPU. I am not able to find oneflow-0.9.1.dev20230212+cu117 version anywhere to replicate the same speed on my local 3090 machine. So my question is why oneflow new version and onediff are not giving speed up on 4090 and 3090 machines. Thank you very much
Device and Context
The benchmarking has been done on my local 3090 machine and on cloud as well.
Benchmark
Previously it was giving around ( 30 iteration/sec ) on 512x512 resolution of stable diffusion 1.5 model on 3090 machine. Now it is giving ( 9-10 iteration / sec ) as same as simple diffusers give.
Alternatives
No response
- Did you make changes to the original implementation?
- Could you run the program with flag
ONEFLOW_MLIR_PRINT_STATS=1
and post the listed ops? - If it is not too inconvenient, could you run both versions with
nsys profile
and post the responsive output files.
Many thanks!