Tianlei Wu comments

Results 108 comments of


                                            Tianlei Wu

Different outputs when run on CPU vs GPU (CUDA)

I saw the absolute difference is not large: ``` Greatest absolute difference: 0.00011079013347625732 at index (0, 573) (up to 1e-05 allowed) ``` I suggest to use end-to-end metric (like precision...

[Documentation] Attention Contrib Op Bias argument Not optional?

@TedThemistokleous, support of 'optional' of bias is added for T5 model in https://github.com/microsoft/onnxruntime/pull/14928. It is supported by CUDA provider. However, CPU provider still requires bias input.

[Documentation] Attention Contrib Op Bias argument Not optional?

It's clear in operator spec. I think CPU EP need slight modification to follow the operator spec, or update the error message for not-implemented feature to avoid confusing users. Let...

[Feature Request] Fp6 datatype support

@TedThemistokleous, For LLM, typical onnx usage is mixing 4bits and 8bits. Most weights can be quantized to 4 bits, some layers need more bits and we normally use 8 bits...

[Feature Request] Fp6 datatype support

Update custom Triton kernel documentation and examples

Even though this approach works when you build from source and run in current machine, the binary might not be able to run in another GPU. If we want to...

Update custom Triton kernel documentation and examples

> I do understand if this isn't accepted in the ORT codebase because of this, but maybe then we could work together on a better way to do it. It's...

Update custom Triton kernel documentation and examples

@Numeri, I do not have idea why LaunchTritonKernel is causing memory access errors. You can do some debugging, like starts with tensor of one element, and add some printf before...