Rui Wang

Results 6 issues of Rui Wang

Hi, I was trying to test `https://github.com/NVIDIA/apex/tree/master/apex/contrib/openfold_triton` with triton but encountered this error and cannot find the solution anywhere. It'd be great if I could get some pointers to check...

Hi, We are testing our new Hopper machines (H800/H100) and trying to use fp8 for training for the first time, but are having trouble installing `TransformerEngine`. It reports ` RuntimeError:...

Saving the output of the normalization instead of the input reduces the memory cost in modern networks, where the output is going to be saved anyways (e.g., a Linear layer)...

When I tested with the following snippet: ```python M = 128 N = 128 K = 128 slice_ = slice(47, 54) def to_float8_e4m3fn(x: torch.Tensor): scales = x.abs().amax(dim=-1, keepdim=True).float().div(FP8_e4m3_MAX) x =...

I find `jaxtyping` a blast to use, as is `tensordict`(https://github.com/pytorch/tensordict). If I could have it both ways, that would be even more amazing! Thanks for the great work!

question

As I understand it, headdim is more important than the number of heads, and the diff transformer chooses to half the number of heads and double the vdim compared to...