Oren issues

Results 17 issues of


                                            Oren

Vboost Print Output is Wrong

### NVIDIA Open GPU Kernel Modules Version 550.90.07 ### Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for...

bug

feat: add return all for do_bench

Since H100s have a power throttling depending on the kernel, it is important to see how the TFLOPs change over time. I have this patch in my internal codebase and...

Max Achievable TFLOP/s on H100 without warmup

As discussed on slack, since we are trying to find what the max FLOPs is for each accelerator. I changed warmup to `0`. Without any magic flags on nvidia drivers...

[BUG] e4m3, int8, bf16 pytorch emitter not working

I am attempting to emit pytorch code but unfortunately it does not work for fp8, bf16, and int8. I have tried to patch the converter type dict https://github.com/OrenLeung/cutlass/commit/6d619c964eb8b9c150a5f97891849d33f6ee8b64 This patch...

bug

? - Needs Triage

[Feature]: MultiModal benchmark_latency, benchmark_throughput, and benchmark_online

### 🚀 The feature, motivation and pitch - multimodal feature to benchmark offline latency, throughput and online serving for multimodal for pixtral ### Alternatives - everyone writes their own script...

feature request

cuSPARSELt matmul example not working on M=N=K8192

on https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul the example runs fine on the existing small m,n,k, but unfortunately when i change my m,n,k to be 8192, i get a runtime error. any pointers or patches...

cuSPARSELt

[ROCm] float8 does not work

Hi @hongxiayang @hliuca , It seems like float8 training using `torchao.float8` is not support at the moment. Is there a different library or code path I should be using for...

module: rocm

float8