Yufeng Li

[email protected]

@Microsoft Sunnyvale, CA

Results 73 comments of


                                            Yufeng Li

add dtype

ONNX doesn't have a direct quantized tensor definition. Essentially, it uses QDQ to represent a quantized tensor. Thus we can limit the change to Q/DQ operators only as @daquexian and...

What is variable-length and comparison with onnxruntime.

Thanks! If so, onnxruntime also support the variable-length you mean here. You can add dynamic_axes in the torch.onnx.export [https://github.com/Tencent/TurboTransformers/blob/f2d66bc12f0b904328372f472f6379aba50007cc/benchmark/benchmark_helper.py#L92]. The API doc is here: [https://pytorch.org/docs/stable/onnx.html#torch.onnx.export]

What is variable-length and comparison with onnxruntime.

Thanks! Could you please update your table after your verification? And I'm curious why you use onnxruntime-mkldnn over the default with mlas. Do you see a better performance with it?

What is variable-length and comparison with onnxruntime.

@feifeibear, some models with dynamic inputs can not be fused at runtime. Could you try this offline tool to optimize the model before running it and see if the performances...

What is variable-length and comparison with onnxruntime.

It's great! We will keep improving the performance. We also support quantization for transformer-based models on CPU now.

ONNXRT can not be applied in Albert

The issue was resolved in latest pytorch. Please make sure to use ONNX opset 12 when exporting: https://github.com/pytorch/pytorch/issues/26893

Dynamic QDQ

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

Dynamic QDQ

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, Linux OpenVINO CI Pipeline,...

Dynamic QDQ

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline

Dynamic QDQ

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed

1
2
3
4
5
6
7
8
›