Jiarui Fang(方佳瑞)
Jiarui Fang(方佳瑞)
Check if your torch is using cuda. ` torch.cuda.is_available() `
What I can make sure is that everything is OK on RTX 2060. Avoid using dockerhub's image, maybe you should build a docker image from scratch by yourself.
I will apply for a V100 and check the code on it. BTW: you can also benchmark the latest turbo version to see which kernel is wrong. https://github.com/Tencent/TurboTransformers/blob/master/docs/profiler.md
It may be the bug of the allocator. We now use NVLab/cub. Try a hand-crafted one instead. ``` git reset --hard bebe404b4d9ea8e18c72c19625dadcc184188236 ```
谢谢zirui,我会按照你的建议重新整理一下dockerfile!
Variable-length indicates turbo can support inputs with different shapes. You can feed it with a stream like [1, 10], [1,15], [2, 30], ... No padding and truncation are required. In...
Cool, onnxrt really did a very good job.
I will update the performance. We built onnxrt v1.0.0 with the following command and benchmark it with tow different backends, `mkldnn` and `cpu`. `./build.sh --config=Release --update --build --build_wheel --use_mkldnn --use_mklml...
I have compared onnxruntime performance dynamic axis vs fixed axis. When using 8 threads, the dynamic axis introduces significant performance degradation.  BTW, the performance figures illustrated in README did...
After onnxruntime is upgraded to v1.4.0, Turbo starts to use onnxruntime as default backend for CPU, which has fully met our needs.