dan_the_3rd comments

Results 83 comments of


dan_the_3rd

FileNotFoundError while trying to build xformers from source

Do you have a `$CUDA_HOME` env variable set by any chance? You can check the detected cuda location with the following: ``` python -c "from torch.utils.cpp_extension import CUDA_HOME; print(CUDA_HOME)" ```

Customization of BlockDiagonalMask or BlockDiagonalCausalMask

Hi, Which hardware are you running on? Are you interested in the BW pass as well? If you have an Sm80+ device with Flash-Attention available, I would look at the...

Is this a mistake in requirement of llama_inference?

Hi, I believe this is on-purpose. There is a bug in PyTorch 2.1 which prevents CUDA GRaphs from working properly with NCCL collectives. It will be fixed in 2.2, but...

Significant performance drop in training

Hi, Which version of xformers are you using / which GPU? Can you report the output of `python -m xformers.info`. We had a bug in the earlier versions of xFormers,...

Torch.onnx.export breaks when memory_efficient_attention

Hi @deyiluobo What are you trying to do with ONNX? If the model won't run with xFormers, you will most likely have another error down the line.

Torch.onnx.export breaks when memory_efficient_attention

I'm not familiar with onnx, but I'm not sure if the operator will be available at runtime (are you using PyTorch at inference time? With xformers?) Cc @fmassa

C++/CUDA extensions. xFormers was built for: PyTorch 2.1.2+cu121 with CUDA 1201 (you have 2.1.0+cpu)

Hi, What GPU / cuda version do you have? We recommend that you install xformers and PyTorch at the same time with a command like that: ```bash # cuda 11...

import xformers.ops error

Hi, It looks like an error in your package. Can you try to reinstall or re-clone the repository?

Significant performance drops when using fast memory efficient attention

Hi @LarsHill Thanks for the detailed report :) Indeed, xformer's memory-efficient attention should be faster than the lucid one. A few questions: (1) I assume you are measuring iteration time...

Significant performance drops when using fast memory efficient attention

> I train with torch.float32 in all cases. xformers kernels have been specially optimized for f16 or bf16 (A100). It you can run your model with either autocast or fully...