ytxiong
ytxiong
### 🐛 Describe the bug when i run the example in your tutorials (basic/colotensor), I met some problems. Traceback (most recent call last): File "colossalai-study/run_dist.py", line 8, in from colossalai.testing...
Hello, I have tested the numeric precision of FusedRMSNorm and MixFusedRMSNorm in two different version respectively. Finally, I found that the gradient of model weights can not keep the same...
when tp size > kv head_number, copy kv head
Add docs for 2d-attention
### Describe the feature Some CPU synchronizations block the GPU kernel, leading to bubbles between GPU kernels. It should be optimized in the future. 1. item() in rotary embedding. 2....
### Your current environment Hi I want to know how to use pipeline parallelism in offline inference? Can anyone give a concrete example about how to use pipeline? Looking forward...
## Motivation 1. The `mlp_layer_fusion` config is useful in MoE; therefore, a warning is added to recommend that users set this config to True in the MoE model. 2. When...
### Feature request Qwen_2_5_VL support variable length attention computation ### Motivation Hello, I try to run qwen25_vl with packing samples, however, I found that it seems this function only passes...
你好啊,想问一下。我最近尝试跑你的那个test_cudnn.cpp代码,然后在编译过程中出现了以下错误: error: ‘CUDNN_CONVOLUTION_FWD_PREFER_FASTEST’ was not declared in this scope; did you mean ‘CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3’? error: ‘cudnnGetConvolutionForwardAlgorithm’ was not declared in this scope; did you mean ‘cudnnGetConvolutionForwardAlgorithm_v7’? 想问一下,你有遇到过这个问题吗?目测好像是cudnn版本问题,我的cudnn版本是8,不知你是否也遇到过这种情况吗?