Tianlei Wu
Tianlei Wu
/azp run common-linux-conda-CI, common-win32-conda-CI
@wenbingl, @snnn please help trigger CI.
@shiqingzhangCSU, To reduce I/O, it need design of special CUDA kernels (and also integrates with BeamSearch operator) to deal with past state. In Onnx Runtime, @wangyems is working on optimizations...
@averad, you can add "RandomNormalLike" to op_block_list to avoid the error. The latest script is here: https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md @kamalkraj, you can run like the following to reproduce ~10s on T4. ```...
@kamalkraj, I will try SD 2/2.1 and get back to you later.
@saikrishna2893, to change inputs and outputs, you can add a parameters like the following: https://github.com/microsoft/onnxruntime/blob/b1abb8c656c597bf221bd85682ae3d9e350d9aba/onnxruntime/python/tools/transformers/models/stable_diffusion/optimize_pipeline.py#L154 to convert the inputs and outputs to FP16. @kamalkraj, you can try out our optimizations...
@wchao1115, The main conversion script could export checkpoint to FP16 model. The model is composed of official ONNX operators, so it could be supported by different execution providers in inference...
Try profiling: https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html. Report back if you have some findings (like some operators or part takes longer in 1.11 vs 1.17).
Yes, it is on 1.1.2 release. MSVC version is 19.37.32825.0. An example failed build: https://github.com/microsoft/onnxruntime/actions/runs/8118954587/job/22194111719?pr=19470 --- fused_conv.cc D:\b\Debug\_deps\cudnn_frontend-src\include\cudnn_frontend/graph_interface.h(444,19): error C2248: 'cudnn_frontend::graph::Layernorm_attributes::forward_phase': cannot access private member declared in class 'cudnn_frontend::graph::Layernorm_attributes' [D:\b\Debug\onnxruntime_providers_cuda.vcxproj]...
You can build onnxruntime from source by adding `--cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1`. Then you can set environment variables `ORT_DEBUG_NODE_IO_DUMP_INPUT_DATA` = 1 and `ORT_DEBUG_NODE_IO_DUMP_OUTPUT_DATA` = 1. When you run your application, it will...