Oren

Results 17 issues of Oren

### NVIDIA Open GPU Kernel Modules Version 550.90.07 ### Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for...

bug

Since H100s have a power throttling depending on the kernel, it is important to see how the TFLOPs change over time. I have this patch in my internal codebase and...

As discussed on slack, since we are trying to find what the max FLOPs is for each accelerator. I changed warmup to `0`. Without any magic flags on nvidia drivers...

I am attempting to emit pytorch code but unfortunately it does not work for fp8, bf16, and int8. I have tried to patch the converter type dict https://github.com/OrenLeung/cutlass/commit/6d619c964eb8b9c150a5f97891849d33f6ee8b64 This patch...

bug
? - Needs Triage

### 🚀 The feature, motivation and pitch - multimodal feature to benchmark offline latency, throughput and online serving for multimodal for pixtral ### Alternatives - everyone writes their own script...

feature request

on https://github.com/NVIDIA/CUDALibrarySamples/tree/master/cuSPARSELt/matmul the example runs fine on the existing small m,n,k, but unfortunately when i change my m,n,k to be 8192, i get a runtime error. any pointers or patches...

cuSPARSELt

Hi @hongxiayang @hliuca , It seems like float8 training using `torchao.float8` is not support at the moment. Is there a different library or code path I should be using for...

module: rocm
float8