Tianlei Wu
Tianlei Wu
**Description**: This work is in progress. (1) Reduce memory of sequence index from B x S to S (2) merge add bias and transpose into one kernel. (3) reduce computation...
In current benchmark results, ONNX is slower than PyTorch above 500 words. I think the cause is the OnnxRuntime API used in inference: https://github.com/abelriboulot/onnxt5/blob/284474952bcb10521a0b0132c677f61981ab2a1c/onnxt5/models.py#L121 For GPU inference, that API need...
**Description**: Fix prefast warnings: * [C26451](https://learn.microsoft.com/en-us/cpp/code-quality/C26451?view=msvc-170) (in onnxruntime/contrib_ops/cpu/transformers/generation_device_helper.cc) (in onnxruntime/contrib_ops/cpu/transformers/greedy_search_impl_gpt.h) (in onnxruntime/contrib_ops/cpu/bert/attention_cpu_base.h) (in onnxruntime/contrib_ops/cpu/bert/attention_helper.h) (in onnxruntime/contrib_ops/cpu/transformers/subgraph_t5_decoder.cc) (in onnxruntime/core/providers/cuda/generator/range.cc) (in onnxruntime/contrib_ops/cuda/transformers/generation_device_helper.cc) (in onnxruntime/contrib_ops/cuda/bert/longformer_attention.cc) (in onnxruntime/contrib_ops/cuda/bert/attention.cc) * [C26436](https://learn.microsoft.com/en-us/cpp/code-quality/c26436?view=msvc-170) (in onnxruntime/contrib_ops/cpu/transformers/beam_search_impl_base.h) (in onnxruntime/contrib_ops/cpu/transformers/greedy_search_impl_base.h)...
### Describe the documentation issue The documentation of ROCm EP lack information like what provider option is available. https://onnxruntime.ai/docs/execution-providers/ROCm-ExecutionProvider.html ### Page / URL https://onnxruntime.ai/docs/execution-providers/ROCm-ExecutionProvider.html
Some user reported that extra Cast nodes after running auto mixed precision conversion. See related issue here: https://github.com/microsoft/onnxruntime/issues/19437 ORT 1.17 has changed the behavior of Cast node removal, and no...
include\cudnn_frontend\graph_interface.h(444,19): Error C2248: 'cudnn_frontend::graph::Layernorm_attributes::forward_phase': cannot access private member declared in class 'cudnn_frontend::graph::Layernorm_attributes'
### Description Sometime, a PR need go through many iterations due to test or build failure. The conversion history become super long after a few iterations. It is not necessary...
### Description Add SparseAttention cpu implementation. It depends on CPU Flash Attention in #20805. This work is still in progress: - [x] Refactoring GQAAttentionBase - [x] Add SparseAttention implementation -...
I tried it in Windows: ``` python -m samexporter.inference ^ --encoder_model output_models\sam2_hiera_large.encoder.onnx ^ --decoder_model output_models\sam2_hiera_large.decoder.onnx ^ --image images\truck.jpg ^ --prompt images\truck_prompt.json ^ --output output_images\sam2_truck.png ^ --sam_variant sam2 ^ --show ```...
### Description Use rocm 6.2.3 in docker files and CI pipelines. Some improvements/upgrades on ROCm docker file: * Use a shared docker file for ROCm and Migraphx CI pipelines to...