dml

Results 9 comments of dml

It's seem that the `WindowAttention` has wrong invoking of `FusedMHARunnerFP16v2`,and we got expected difference of `FP16_op_output` and `FP16_torch_traced_output` after forbidding the use of fused attention: ```c++ src/fastertransformer/layers/attention_layers/WindowAttention.cc 183 if ((sm...

> ```shell > CUDA Error: (null) /workdir/xxx/packages/v5.0_tag/FasterTransformer-release-v5.0_tag/3rdparty/trt_fused_multihead_attention/fused_multihead_attention_v2.h 682 > ``` > > This error means that you don't call fused mha successfully. Can you provide the docker image you use...

> > ```shell > > CUDA Error: (null) /workdir/xxx/packages/v5.0_tag/FasterTransformer-release-v5.0_tag/3rdparty/trt_fused_multihead_attention/fused_multihead_attention_v2.h 682 > > ``` > > > > > > > > > > > > > > > > >...

> I believe CUDA 11.0 is runnable. I try to build the cpp example by `nvcr.io/nvidia/pytorch:20.07-py3`, which contains CUDA 11.0. > > I can run the cpp example successfully by...

@byshiue, It's FT linking the error cuda library in **my docker image**, which links to the `libcuda.so` from `/usr/local/cuda/lib64/stubs/libcuda.so`. I debug into the following location to check the error code:...

@serbanc94, thanks, your example is very helpful. And how can we find more options supported by input_args(eg, `device_id`), which isn't descripted in doc of `ffmpeg-python`.

> There is a pull request #2308 handling this. Thanks, we will try it later and provide timely feedback if any issues arise.

> There is a pull request #2308 handling this. Also, I'd like to ask if TurboMind plans to support the w8a8 feature for VLM (Vision-Language Models) models in the future?

> Turbomind is only responsible for llm. Vision model in lmdeploy used pytorch. Excuse me, I made a mistake in my statement. What I actually wanted to ask is whether...