Driss Guessous
Driss Guessous
# Summary Still need to figure out this symbol Current work around is to set: `LD_PRELOAD=/usr/lib64/libcuda.so`, the lazyNVRTC approach should be the correct approach but still getting Not sure why...
# Summary Fixes: #125413
# Summary cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang
# Summary I haven't been able to figure this one out, curious if other have seen it. If I create a python project and use hatchling as the build system:...
# Summary Currently Efficient Attention allows the is_causal flag as well as the attn_mask argument. The math backend explicitly disables this. For causal attention we can iterate over half the...
Repro: ```Python from torchao.dtypes import to_nf4 from transformer_nuggets.quant.nf4_tensor import NF4Tensor import torch from pathlib import Path from transformer_nuggets.utils.benchmark import save_memory_snapshot import logging; logging.basicConfig(level=logging.INFO) torch.set_default_device("cuda:0") mem_reserved = torch.cuda.max_memory_reserved() print(f"{mem_reserved / 1e9}...
# Summary When you call torch.nn.F.linear() you will call transpose on the weight. One solution for doing this is to lazily compute the transpose, which is mark that the matrix...
Sdpa api
# Summary This gunna take a second