Tianlei Wu issues

Results 14 issues of


                                            Tianlei Wu

[WIP] Improve LongformerAttention performance

**Description**: This work is in progress. (1) Reduce memory of sequence index from B x S to S (2) merge add bias and transpose into one kernel. (3) reduce computation...

Use OnnxRuntime IO Binding to improve GPU inference performance

In current benchmark results, ONNX is slower than PyTorch above 500 words. I think the cause is the OnnxRuntime API used in inference: https://github.com/abelriboulot/onnxt5/blob/284474952bcb10521a0b0132c677f61981ab2a1c/onnxt5/models.py#L121 For GPU inference, that API need...

Fix prefast warnings

**Description**: Fix prefast warnings: * [C26451](https://learn.microsoft.com/en-us/cpp/code-quality/C26451?view=msvc-170) (in onnxruntime/contrib_ops/cpu/transformers/generation_device_helper.cc) (in onnxruntime/contrib_ops/cpu/transformers/greedy_search_impl_gpt.h) (in onnxruntime/contrib_ops/cpu/bert/attention_cpu_base.h) (in onnxruntime/contrib_ops/cpu/bert/attention_helper.h) (in onnxruntime/contrib_ops/cpu/transformers/subgraph_t5_decoder.cc) (in onnxruntime/core/providers/cuda/generator/range.cc) (in onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc) (in onnxruntime/contrib_ops/cuda/bert/longformer_attention.cc) (in onnxruntime/contrib_ops/cuda/bert/attention.cc) * [C26436](https://learn.microsoft.com/en-us/cpp/code-quality/c26436?view=msvc-170) (in onnxruntime/contrib_ops/cpu/transformers/beam_search_impl_base.h) (in onnxruntime/contrib_ops/cpu/transformers/greedy_search_impl_base.h)...

[Documentation] ROCm provider lacks provider option document

### Describe the documentation issue The documentation of ROCm EP lack information like what provider option is available. https://onnxruntime.ai/docs/execution-providers/ROCm-ExecutionProvider.html ### Page / URL https://onnxruntime.ai/docs/execution-providers/ROCm-ExecutionProvider.html

documentation

Extra Cast nodes causes overflow in onnxruntime 1.17

Some user reported that extra Cast nodes after running auto mixed precision conversion. See related issue here: https://github.com/microsoft/onnxruntime/issues/19437 ORT 1.17 has changed the behavior of Cast node removal, and no...

Windows build error

include\cudnn_frontend\graph_interface.h(444,19): Error C2248: 'cudnn_frontend::graph::Layernorm_attributes::forward_phase': cannot access private member declared in class 'cudnn_frontend::graph::Layernorm_attributes'

Add sanity check option to run CI pipelines for external PRs

### Description Sometime, a PR need go through many iterations due to test or build failure. The conversion history become super long after a few iterations. It is not necessary...

[CPU] SparseAttention op

### Description Add SparseAttention cpu implementation. It depends on CPU Flash Attention in #20805. This work is still in progress: - [x] Refactoring GQAAttentionBase - [x] Add SparseAttention implementation -...

large model inference result not correct?

I tried it in Windows: ``` python -m samexporter.inference ^ --encoder_model output_models\sam2_hiera_large.encoder.onnx ^ --decoder_model output_models\sam2_hiera_large.decoder.onnx ^ --image images\truck.jpg ^ --prompt images\truck_prompt.json ^ --output output_images\sam2_truck.png ^ --sam_variant sam2 ^ --show ```...

[ROCm] Use ROCm 6.2.3 in docker and ROCm/Migrahx CI pipelines

### Description Use rocm 6.2.3 in docker files and CI pipelines. Some improvements/upgrades on ROCm docker file: * Use a shared docker file for ROCm and Migraphx CI pipelines to...