Driss Guessous issues

Results 39 issues of


                                            Driss Guessous

FP8 rowwise scaling

# Summary Still need to figure out this symbol Current work around is to set: `LD_PRELOAD=/usr/lib64/libcuda.so`, the lazyNVRTC approach should be the correct approach but still getting Not sure why...

Allow building for sm90a

# Summary Fixes: #125413

Add Lowering for FlexAttention Backwards

# Summary cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

module: inductor

ciflow/inductor

Python Language Server not finding package

# Summary I haven't been able to figure this one out, curious if other have seen it. If I create a python project and use hatchling as the build system:...

Make causal + mask handiling consistent

# Summary Currently Efficient Attention allows the is_causal flag as well as the attn_mask argument. The math backend explicitly disables this. For causal attention we can iterate over half the...

Stale

Adding new cmake/scikitbuild core based workflow

cla signed

add an option to quantize in chunks

Repro: ```Python from torchao.dtypes import to_nf4 from transformer_nuggets.quant.nf4_tensor import NF4Tensor import torch from pathlib import Path from transformer_nuggets.utils.benchmark import save_memory_snapshot import logging; logging.basicConfig(level=logging.INFO) torch.set_default_device("cuda:0") mem_reserved = torch.cuda.max_memory_reserved() print(f"{mem_reserved / 1e9}...

Driss Guessous