Jiarui Fang(方佳瑞)

Results 220 comments of Jiarui Fang(方佳瑞)

Check if your torch is using cuda. ` torch.cuda.is_available() `

What I can make sure is that everything is OK on RTX 2060. Avoid using dockerhub's image, maybe you should build a docker image from scratch by yourself.

I will apply for a V100 and check the code on it. BTW: you can also benchmark the latest turbo version to see which kernel is wrong. https://github.com/Tencent/TurboTransformers/blob/master/docs/profiler.md

It may be the bug of the allocator. We now use NVLab/cub. Try a hand-crafted one instead. ``` git reset --hard bebe404b4d9ea8e18c72c19625dadcc184188236 ```

谢谢zirui,我会按照你的建议重新整理一下dockerfile!

Variable-length indicates turbo can support inputs with different shapes. You can feed it with a stream like [1, 10], [1,15], [2, 30], ... No padding and truncation are required. In...

Cool, onnxrt really did a very good job.

I will update the performance. We built onnxrt v1.0.0 with the following command and benchmark it with tow different backends, `mkldnn` and `cpu`. `./build.sh --config=Release --update --build --build_wheel --use_mkldnn --use_mklml...

I have compared onnxruntime performance dynamic axis vs fixed axis. When using 8 threads, the dynamic axis introduces significant performance degradation. ![image](https://user-images.githubusercontent.com/5706969/81275574-761e4700-9084-11ea-8101-fa4b698f27b1.png) BTW, the performance figures illustrated in README did...

After onnxruntime is upgraded to v1.4.0, Turbo starts to use onnxruntime as default backend for CPU, which has fully met our needs.