fangjiayueyuan
fangjiayueyuan
Running with config={'transformer_engine_version': '2.2.0+d0c452cc', 'compute_capability': 'sm90', 'flash_attn_version': 'not installed', 'flash_attn_3_version': 'not installed', 'cudnn_version': '9.8.0', 'qkv_type': , 'qkv_dtype': torch.bfloat16, 'qkv_layout': 'thd_thd_thd', 'batch_size': 2, 'num_heads': 1, 'num_gqa_groups': 1, 'max_seqlen_q': 2176, 'max_seqlen_kv': 2176,...
=== MLA Debug Info === 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m query shape: torch.Size([4320, 1, 192]), stride: (192, 192, 1), is_contiguous: True 21:39:56.349 [36m(WorkerDict pid=7207, ip=33.17.207.174)[0m key shape: torch.Size([4320, 1, 192]), stride:...