bhsueh_NV comments

Results 639 comments of


                                            bhsueh_NV

Is allow_gemm_test flag effective in the BERT example?

Hi, dontloo. Thank you for the feedback and sorry for the confusion. We have removed this feature but not update the document. We will fix it in the next update.

Is allow_gemm_test flag effective in the BERT example?

> [bert_guide.md](https://github.com/NVIDIA/FasterTransformer/blob/main/docs/bert_guide.md) section 1.1.2 mentioned this allow_gemm_test flag, but it seems this flag is not effective and not used in the bertExample method in [bert_example.cc](https://github.com/NVIDIA/FasterTransformer/blob/6fddeac5f59ce4df380002aa945da57a0c8e878c/examples/cpp/bert/bert_example.cc#L71). > > Output shows "using...

Is allow_gemm_test flag effective in the BERT example?

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

Does it support python api? In a similar way to trtexec, transfer the engine model and do inference？

No pure python api. If you want to run on python, you need use PyTorch/TensorFlow custom ops.

Does it support python api? In a similar way to trtexec, transfer the engine model and do inference？

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

nvcc fatal : Unknown option 'Wall'

Can you provide the cmake version and cuda version you use? It seems some problems caused by compiler.

nvcc fatal : Unknown option 'Wall'

Do you try on the pytorch docker image we recommend?

Add support for head_dim > 1024 for fp16, no whitespace change

I cannot compile the codes successfully. Even if I fix the issue, I will get wrong results when I run the hidden_dim > 1024. How do you verify the correctness?

Add support for head_dim > 1024 for fp16, no whitespace change

[Here](https://github.com/NVIDIA/FasterTransformer/blob/main/sample/tensorflow/unit_test/bert_encoder_unit_test.py) is a simple unit test. You can add some cases with hidden_dimension > 1024 into the unit test. The request of #104 are supported in [next beta version](https://github.com/NVIDIA/FasterTransformer/tree/dev/v5.0_beta).

Add support for head_dim > 1024 for fp16, no whitespace change

For your request and the BERT model, it should be steady. We release it as beta version because: 1. We may break the API again recently. 2. We still not...