dml comments

Results 9 comments of

dml

The fp16 inference of pytorch swintransformer op got `nan` output.

It's seem that the `WindowAttention` has wrong invoking of `FusedMHARunnerFP16v2`，and we got expected difference of `FP16_op_output` and `FP16_torch_traced_output` after forbidding the use of fused attention: ```c++ src/fastertransformer/layers/attention_layers/WindowAttention.cc 183 if ((sm...

The fp16 inference of pytorch swintransformer op got `nan` output.

> ```shell > CUDA Error: (null) /workdir/xxx/packages/v5.0_tag/FasterTransformer-release-v5.0_tag/3rdparty/trt_fused_multihead_attention/fused_multihead_attention_v2.h 682 > ``` > > This error means that you don't call fused mha successfully. Can you provide the docker image you use...

dml

The fp16 inference of pytorch swintransformer op got `nan` output.

The fp16 inference of pytorch swintransformer op got `nan` output.

The fp16 inference of pytorch swintransformer op got `nan` output.

The fp16 inference of pytorch swintransformer op got `nan` output.

The fp16 inference of pytorch swintransformer op got `nan` output.

NVENCODE to speed up encoding.

[Feature] Will multi-modal models support W8A8 quantization in the future?

[Feature] Will multi-modal models support W8A8 quantization in the future?

[Feature] Will multi-modal models support W8A8 quantization in the future?