on Jetson ORIN, Memory-efficient attention, SwiGLU, sparse and more won't be available.

Open cj401 opened this issue 2 years ago • 1 comments

🐛 Bug

I was trying to run mistral-7b on Jeston ORIN and built triton (openAI) and xformers from source.

However, when trying to run mistral-7b, I got following errors:

python -m main demo mistral-7B-v0.1/
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.1.0a0+41361538.nv23.06 with CUDA 1104 (you have 2.1.0a0+41361538.nv23.06)
    Python  3.8.10 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details

raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 27, 32, 128) (torch.float16)
     key         : shape=(1, 27, 32, 128) (torch.float16)
     value       : shape=(1, 27, 32, 128) (torch.float16)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
     p           : 0.0
`decoderF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
`[email protected]` is not supported because:
    xFormers wasn't build with CUDA support
`tritonflashattF` is not supported because:
    xFormers wasn't build with CUDA support
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
    Only work on pre-MLIR triton for now
`cutlassF` is not supported because:
    xFormers wasn't build with CUDA support
    operator wasn't built - see `python -m xformers.info` for more info
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    xFormers wasn't build with CUDA support
    dtype=torch.float16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalLocalAttentionMask'>
    operator wasn't built - see `python -m xformers.info` for more info
    unsupported embed per head: 128

Command

python3 -m xformers.info

I got

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.1.0a0+41361538.nv23.06 with CUDA 1104 (you have 2.1.0a0+41361538.nv23.06)
    Python  3.8.10 (you have 3.8.10)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
xFormers 0.0.24+40d3967.d20231209
memory_efficient_attention.cutlassF:               unavailable
memory_efficient_attention.cutlassB:               unavailable
memory_efficient_attention.decoderF:               unavailable
[email protected]:         available
[email protected]:         available
memory_efficient_attention.smallkF:                unavailable
memory_efficient_attention.smallkB:                unavailable
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
memory_efficient_attention.triton_splitKF:         available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             unavailable
swiglu.gemm_fused_operand_sum:                     unavailable
swiglu.fused.p.cpp:                                not built
is_triton_available:                               True
pytorch.version:                                   2.1.0a0+41361538.nv23.06
pytorch.cuda:                                      available
gpu.compute_capability:                            8.7
gpu.name:                                          Orin
build.info:                                        available
build.cuda_version:                                1104
build.python_version:                              3.8.10
build.torch_version:                               2.1.0a0+41361538.nv23.06
build.env.TORCH_CUDA_ARCH_LIST:                    None
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source

To Reproduce

Steps to reproduce the behavior:

Expected behavior

Environment

Please copy and paste the output from the environment collection script from PyTorch (or fill out the checklist below manually).

You can run the script with:

# For security purposes, please check the contents of collect_env.py before running it.
python -m torch.utils.collect_env

PyTorch Version (e.g., 1.0):
OS (e.g., Linux): Linux (Jetson ORIN)
How you installed PyTorch (conda, pip, source): n/a
Build command you used (if compiling from source): python setup.py install
Python version: 3.8
CUDA/cuDNN version: 1104
GPU models and configuration: Jetson ORIN iGPU
Any other relevant information:

Additional context

Dec 10 '23 12:12 cj401

The SwiGLU in xformers is only accelerated on A100 and later.

See https://github.com/facebookresearch/xformers/blob/main/xformers/ops/swiglu_op.py#L353

Nov 12 '24 14:11 sebhtml

xformers xformers copied to clipboard

on Jetson ORIN, Memory-efficient attention, SwiGLU, sparse and more won't be available.

🐛 Bug

Command

To Reproduce

Expected behavior

Environment

Additional context

xformers
xformers copied to clipboard