TransformerEngine No dot product attention backend is available for the provided inputs

=== MLA Debug Info === 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m query shape: torch.Size([4320, 1, 192]), stride: (192, 192, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m key shape: torch.Size([4320, 1, 192]), stride: (192, 192, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m value shape: torch.Size([4320, 1, 128]), stride: (128, 128, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m packed_seq_params: PackedSeqParams(qkv_format='thd', cu_seqlens_q=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_kv=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_q_padded=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_kv_padded=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), max_seqlen_q=2208, max_seqlen_kv=2208) 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Running with config={'transformer_engine_version': '2.3.0+5de3e148', 'compute_capability': 'sm90', 'flash_attn_version': '2.7.4.post1', 'flash_attn_3_version': 'not installed', 'cudnn_version': '9.8.0', 'qkv_type': <class 'torch.Tensor'>, 'qkv_dtype': torch.bfloat16, 'qkv_layout': 'thd_thd_thd', 'batch_size': 2, 'num_heads': 1, 'num_gqa_groups': 1, 'max_seqlen_q': 2208, 'max_seqlen_kv': 2208, 'head_dim_qk': 192, 'head_dim_v': 128, 'attn_mask_type': 'padding_causal', 'window_size': (-1, 0), 'alibi_slopes_shape': None, 'core_attention_bias_type': 'no_bias', 'core_attention_bias_shape': None, 'core_attention_bias_requires_grad': False, 'pad_between_seqs': False, 'attention_dropout': 0.0, 'context_parallel': False, 'deterministic': False, 'is_training': True, 'fp8': False, 'fp8_meta': {'fp8_checkpoint': False, 'fp8_group': None}, 'inference_params': None} 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Disabling FlashAttention 2 due to NVTE_FLASH_ATTN=0 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Disabling UnfusedDotProductAttention due to NVTE_UNFUSED_ATTN=0 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Disabling FusedAttention as no backend supports the provided input 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Available backends = {FlashAttention=False, FusedAttention=False, UnfusedDotProductAttention=False} 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Selected backend = NoBackend

Te can not find right backend, how to solve it? help!!

Nov 06 '25 14:11 fangjiayueyuan

From the log, the Flash Attention backend is disabled because you set NVTE_FLASH_ATTN=0, while the cuDNN attention backend is disabled because input is not supported.

So a quick fix is removing NVTE_FLASH_ATTN=0 and try again.

Nov 12 '25 03:11 yaox12

You can also upgrade your cuDNN version to the latest, i.e. 9.16, which should support your config now. Also, if possible, it'd be helpful to upgrade your TE version, to say 2.10. I'll close the bug for now, but please do let us know if there are still any issues running with this config. Thanks!

Dec 03 '25 23:12 cyanguwa