Driss Guessous

Results 39 issues of Driss Guessous

# Summary Still need to figure out this symbol Current work around is to set: `LD_PRELOAD=/usr/lib64/libcuda.so`, the lazyNVRTC approach should be the correct approach but still getting Not sure why...

# Summary Fixes: #125413

# Summary cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

module: inductor
ciflow/inductor

# Summary I haven't been able to figure this one out, curious if other have seen it. If I create a python project and use hatchling as the build system:...

# Summary Currently Efficient Attention allows the is_causal flag as well as the attn_mask argument. The math backend explicitly disables this. For causal attention we can iterate over half the...

Stale

Repro: ```Python from torchao.dtypes import to_nf4 from transformer_nuggets.quant.nf4_tensor import NF4Tensor import torch from pathlib import Path from transformer_nuggets.utils.benchmark import save_memory_snapshot import logging; logging.basicConfig(level=logging.INFO) torch.set_default_device("cuda:0") mem_reserved = torch.cuda.max_memory_reserved() print(f"{mem_reserved / 1e9}...

# Summary When you call torch.nn.F.linear() you will call transpose on the weight. One solution for doing this is to lazily compute the transpose, which is mark that the matrix...

# Summary This gunna take a second