TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

No dot product attention backend is available for the provided inputs

Open fangjiayueyuan opened this issue 2 months ago • 1 comments

=== MLA Debug Info === 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m query shape: torch.Size([4320, 1, 192]), stride: (192, 192, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m key shape: torch.Size([4320, 1, 192]), stride: (192, 192, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m value shape: torch.Size([4320, 1, 128]), stride: (128, 128, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m packed_seq_params: PackedSeqParams(qkv_format='thd', cu_seqlens_q=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_kv=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_q_padded=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_kv_padded=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), max_seqlen_q=2208, max_seqlen_kv=2208) 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Running with config={'transformer_engine_version': '2.3.0+5de3e148', 'compute_capability': 'sm90', 'flash_attn_version': '2.7.4.post1', 'flash_attn_3_version': 'not installed', 'cudnn_version': '9.8.0', 'qkv_type': <class 'torch.Tensor'>, 'qkv_dtype': torch.bfloat16, 'qkv_layout': 'thd_thd_thd', 'batch_size': 2, 'num_heads': 1, 'num_gqa_groups': 1, 'max_seqlen_q': 2208, 'max_seqlen_kv': 2208, 'head_dim_qk': 192, 'head_dim_v': 128, 'attn_mask_type': 'padding_causal', 'window_size': (-1, 0), 'alibi_slopes_shape': None, 'core_attention_bias_type': 'no_bias', 'core_attention_bias_shape': None, 'core_attention_bias_requires_grad': False, 'pad_between_seqs': False, 'attention_dropout': 0.0, 'context_parallel': False, 'deterministic': False, 'is_training': True, 'fp8': False, 'fp8_meta': {'fp8_checkpoint': False, 'fp8_group': None}, 'inference_params': None} 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Disabling FlashAttention 2 due to NVTE_FLASH_ATTN=0 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Disabling UnfusedDotProductAttention due to NVTE_UNFUSED_ATTN=0 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Disabling FusedAttention as no backend supports the provided input 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Available backends = {FlashAttention=False, FusedAttention=False, UnfusedDotProductAttention=False} 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Selected backend = NoBackend

Te can not find right backend, how to solve it? help!!

fangjiayueyuan avatar Nov 06 '25 14:11 fangjiayueyuan

From the log, the Flash Attention backend is disabled because you set NVTE_FLASH_ATTN=0, while the cuDNN attention backend is disabled because input is not supported.

So a quick fix is removing NVTE_FLASH_ATTN=0 and try again.

yaox12 avatar Nov 12 '25 03:11 yaox12

You can also upgrade your cuDNN version to the latest, i.e. 9.16, which should support your config now. Also, if possible, it'd be helpful to upgrade your TE version, to say 2.10. I'll close the bug for now, but please do let us know if there are still any issues running with this config. Thanks!

cyanguwa avatar Dec 03 '25 23:12 cyanguwa