Tianlei Wu comments

Results 214 comments of


                                            Tianlei Wu

[VitisAI] Translate all session configs into provider options with prefix

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

[VitisAI] remove unused header

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI...

[VitisAI] remove unused header

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline

[VitisAI] remove unused header

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

[VitisAI] remove unused header

/azp run ONNX Runtime Web CI Pipeline

[VitisAI] remove unused header

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

[Performance] Why does genai run 2x as fast as vanilla managed onnxruntime?

Source code of genai: https://github.com/microsoft/onnxruntime-genai. For example, use i/o binding to bind past and present to a fixed buffer. Otherwise, copying kv cache will slow down generation significantly.

Fix num splits bug

Please change all places of get_num_splits_and_buffer_sizes using total sequence length.