dan_the_3rd

Results 83 comments of dan_the_3rd

Do you have a `$CUDA_HOME` env variable set by any chance? You can check the detected cuda location with the following: ``` python -c "from torch.utils.cpp_extension import CUDA_HOME; print(CUDA_HOME)" ```

Hi, Which hardware are you running on? Are you interested in the BW pass as well? If you have an Sm80+ device with Flash-Attention available, I would look at the...

Hi, I believe this is on-purpose. There is a bug in PyTorch 2.1 which prevents CUDA GRaphs from working properly with NCCL collectives. It will be fixed in 2.2, but...

Hi, Which version of xformers are you using / which GPU? Can you report the output of `python -m xformers.info`. We had a bug in the earlier versions of xFormers,...

Hi @deyiluobo What are you trying to do with ONNX? If the model won't run with xFormers, you will most likely have another error down the line.

I'm not familiar with onnx, but I'm not sure if the operator will be available at runtime (are you using PyTorch at inference time? With xformers?) Cc @fmassa

Hi, What GPU / cuda version do you have? We recommend that you install xformers and PyTorch at the same time with a command like that: ```bash # cuda 11...

Hi, It looks like an error in your package. Can you try to reinstall or re-clone the repository?

Hi @LarsHill Thanks for the detailed report :) Indeed, xformer's memory-efficient attention should be faster than the lucid one. A few questions: (1) I assume you are measuring iteration time...

> I train with torch.float32 in all cases. xformers kernels have been specially optimized for f16 or bf16 (A100). It you can run your model with either autocast or fully...