Jiarui Fang（方佳瑞） comments

Results 220 comments of


                                            Jiarui Fang（方佳瑞）

Turbo slower than Torch on V100

Check if your torch is using cuda. ` torch.cuda.is_available() `

Turbo slower than Torch on V100

What I can make sure is that everything is OK on RTX 2060. Avoid using dockerhub's image, maybe you should build a docker image from scratch by yourself.

Turbo slower than Torch on V100

I will apply for a V100 and check the code on it. BTW: you can also benchmark the latest turbo version to see which kernel is wrong. https://github.com/Tencent/TurboTransformers/blob/master/docs/profiler.md

Turbo slower than Torch on V100

It may be the bug of the allocator. We now use NVLab/cub. Try a hand-crafted one instead. ``` git reset --hard bebe404b4d9ea8e18c72c19625dadcc184188236 ```

关于docker build问题

谢谢zirui，我会按照你的建议重新整理一下dockerfile！

What is variable-length and comparison with onnxruntime.

Variable-length indicates turbo can support inputs with different shapes. You can feed it with a stream like [1, 10], [1,15], [2, 30], ... No padding and truncation are required. In...

What is variable-length and comparison with onnxruntime.

Cool, onnxrt really did a very good job.

What is variable-length and comparison with onnxruntime.

I will update the performance. We built onnxrt v1.0.0 with the following command and benchmark it with tow different backends, `mkldnn` and `cpu`. `./build.sh --config=Release --update --build --build_wheel --use_mkldnn --use_mklml...

What is variable-length and comparison with onnxruntime.

I have compared onnxruntime performance dynamic axis vs fixed axis. When using 8 threads, the dynamic axis introduces significant performance degradation. ![image](https://user-images.githubusercontent.com/5706969/81275574-761e4700-9084-11ea-8101-fa4b698f27b1.png) BTW, the performance figures illustrated in README did...

What is variable-length and comparison with onnxruntime.

After onnxruntime is upgraded to v1.4.0, Turbo starts to use onnxruntime as default backend for CPU, which has fully met our needs.