oneflow icon indicating copy to clipboard operation
oneflow copied to clipboard

[Performance Issue]: The New Update of Oneflow is not optimising enough for RTX 3090, 4090

Open abi98213 opened this issue 1 year ago • 1 comments

Brief Description

Hello, We had created the python environment which was going good inference speed for stable diffusion using oneflow diffusers ( 30 it/sec ) on my 3090. The installed oneflow version was oneflow-0.9.1.dev20230212+cu117. And I was using oneflow-fork as diffusers branch. With the new update of oneflow the speed has considerably gone down on 3090 and 4090 but it is same on A100 GPU. I am not able to find oneflow-0.9.1.dev20230212+cu117 version anywhere to replicate the same speed on my local 3090 machine. So my question is why oneflow new version and onediff are not giving speed up on 4090 and 3090 machines. Thank you very much

Device and Context

The benchmarking has been done on my local 3090 machine and on cloud as well.

Benchmark

Previously it was giving around ( 30 iteration/sec ) on 512x512 resolution of stable diffusion 1.5 model on 3090 machine. Now it is giving ( 9-10 iteration / sec ) as same as simple diffusers give.

Alternatives

No response

abi98213 avatar Apr 16 '23 18:04 abi98213

  • Did you make changes to the original implementation?
  • Could you run the program with flag ONEFLOW_MLIR_PRINT_STATS=1 and post the listed ops?
  • If it is not too inconvenient, could you run both versions with nsys profile and post the responsive output files.

Many thanks!

jackalcooper avatar Apr 17 '23 01:04 jackalcooper