No dot product attention backend is available for the provided inputs
=== MLA Debug Info === 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m query shape: torch.Size([4320, 1, 192]), stride: (192, 192, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m key shape: torch.Size([4320, 1, 192]), stride: (192, 192, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m value shape: torch.Size([4320, 1, 128]), stride: (128, 128, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m packed_seq_params: PackedSeqParams(qkv_format='thd', cu_seqlens_q=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_kv=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_q_padded=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), cu_seqlens_kv_padded=tensor([ 0, 2208, 4320], device='cuda:0', dtype=torch.int32), max_seqlen_q=2208, max_seqlen_kv=2208) 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Running with config={'transformer_engine_version': '2.3.0+5de3e148', 'compute_capability': 'sm90', 'flash_attn_version': '2.7.4.post1', 'flash_attn_3_version': 'not installed', 'cudnn_version': '9.8.0', 'qkv_type': <class 'torch.Tensor'>, 'qkv_dtype': torch.bfloat16, 'qkv_layout': 'thd_thd_thd', 'batch_size': 2, 'num_heads': 1, 'num_gqa_groups': 1, 'max_seqlen_q': 2208, 'max_seqlen_kv': 2208, 'head_dim_qk': 192, 'head_dim_v': 128, 'attn_mask_type': 'padding_causal', 'window_size': (-1, 0), 'alibi_slopes_shape': None, 'core_attention_bias_type': 'no_bias', 'core_attention_bias_shape': None, 'core_attention_bias_requires_grad': False, 'pad_between_seqs': False, 'attention_dropout': 0.0, 'context_parallel': False, 'deterministic': False, 'is_training': True, 'fp8': False, 'fp8_meta': {'fp8_checkpoint': False, 'fp8_group': None}, 'inference_params': None} 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Disabling FlashAttention 2 due to NVTE_FLASH_ATTN=0 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,472:Disabling UnfusedDotProductAttention due to NVTE_UNFUSED_ATTN=0 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Disabling FusedAttention as no backend supports the provided input 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Available backends = {FlashAttention=False, FusedAttention=False, UnfusedDotProductAttention=False} 21:39:57.328 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m DEBUG:2025-11-06 21:39:55,475:Selected backend = NoBackend
Te can not find right backend, how to solve it? help!!
From the log, the Flash Attention backend is disabled because you set NVTE_FLASH_ATTN=0, while the cuDNN attention backend is disabled because input is not supported.
So a quick fix is removing NVTE_FLASH_ATTN=0 and try again.