Zipeng Xie comments

Results 37 comments of


                                            Zipeng Xie

Refactor dataloader rdma

version: 0.8.1+cu117.git.83ca41036d（4卡数据并行训练开启rdma） git_commit: 83ca41036d cmake_build_type: RelWithDebInfo rdma: True mlir: False 和 version: 0.8.1.dev20221204+cu112（4卡数据并行训练关闭rdma） git_commit: ad20365 cmake_build_type: Release rdma: True mlir: True - 200个iter： ![loss](https://user-images.githubusercontent.com/53039617/205583828-3d4b67f2-1bdf-40c3-a79f-60766d6f94f8.jpg) - 500个iter： ![loss](https://user-images.githubusercontent.com/53039617/205583979-7e02e14a-8631-4252-a48a-44759a33c1e9.jpg) - 1000个iter ![loss](https://user-images.githubusercontent.com/53039617/205584096-1c3e2a62-71d9-497c-bf73-340c0a0d97fe.jpg)...

Use fuse multi head att

#### batch size = 4, acc step = 8, amp, open Checkpointing | 1n1g | use_fuse_multi_head_att = False | use_fuse_multi_head_att = True| | :---: | :---: | :---: | |...

Use fuse multi head att

- 这里魔改了一下，`self_att`和`cross_att`都使用了`fuse_muti_head_att`，`attention`层默认为`fuse_multi_head_att`，一共只多出3个必须的`transpose`：`encode_embedding`的输出进行一次`transpose`，`decoder_embedding`的输出进行一次`transpose`，`loss`接收的`logits`进行一次`transpose` - 如果数据处理的时候直接处理成`[seq_len, batch_size]`的`shape`的话上述3个`transpose`可以取消 - 用这个pr下面的单测测过了修改后的模型和huggingface对齐：`tests/model_utils/test_mt5_loader_2.py` @chengtbf @CPFLAME @strint @ouyangyu

use_fuse_mask_softmax

### 测试use_fuse_mask_softmax 的性能增益 oneflow分支：python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/release/mt5_opt/cu112 对应的oneflow commit：[2d080aa](https://github.com/Oneflow-Inc/oneflow/pull/9318/commits/2d080aac5c41c02346641a5576b359bc95399214) libai分支：[use_fuse_mask_softmax ](https://github.com/Oneflow-Inc/libai/pull/412) 在`projects/T5/configs/t5_model_config.py`中测量`model.cfg.scale_mask_softmax_fusion = False`和`model.cfg.scale_mask_softmax_fusion = True`上的吞吐区别 @ouyangyu @chengtbf @strint

use_fuse_mask_softmax

#### batch size = 4, acc step = 8, amp, open Checkpointing | 1n1g | use_fuse_mask_softmax = False | use_fuse_mask_softmax = True| | :---: | :---: | :---: | |...

TypeError: init() got an unexpected keyword argument 'flags'

check your omegaconf version==2.1.0？

TypeError: init() got an unexpected keyword argument 'flags'

可以安装一下最新的oneflow：`python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/cu116` 然后libai用`pip install -e .`的方式来安装试试呢

TypeError: init() got an unexpected keyword argument 'flags'

> 1. 我用一块GPU完成了T5的训练，但是T5的模型参数量不是110亿吗？用一块V100怎么会加载完这么大的模型? 可以确认一下自己训练的时候用的什么规模的模型配置 > 2. 我训练完之后该用怎么代码来测试效果呀？ libai里和其他的库比如megatron提供的都是模型的预训练任务，所以测试效果可以在测试集上跑一下预训练任务的指标，如果希望训练出完整的T5，也就是达到libai中利用T5权重做推理任务的话，还需要在多个下游任务上finetune预训练模型后测试效果 @michelleqyhqyh

[Bug][MT5] Throughput is unexpected

这里有点问题，libai.models.T5Model是megatron的版本，IDEA需要的是huggingface版本的T5，也就是libai的projects下的T5（projects/T5是交付项目），这两个模型结构有区别，已经让yongning增加一份projects/T5的测试了，交付之前也是用projects/T5来和libai.model.T5Model来测的纯数据并行：[here](https://github.com/Oneflow-Inc/OneTeam/issues/1435#issuecomment-1191163272)，两个模型不一样，感觉不能简单地去比较和megatron的性能，因为megatron实现的不是huggingface版本的T5 两个T5的区别总结： - layernorm对应的算子不同（mt5用c++拼接算子:[RMSLayernorm](https://github.com/Oneflow-Inc/oneflow/pull/8725)） - decoder多一层embedding：https://github.com/Oneflow-Inc/libai/blob/b3c5ba2b90ae6debbebf8e9b96806327fb21c9c5/projects/T5/models/attention.py#L117-L120 - dropout对应算子不同 (mt5使用的是：https://github.com/Oneflow-Inc/oneflow/pull/8693) - mt5（projects下的T5）的lm_head没有共享embedding的参数（https://github.com/Oneflow-Inc/libai/blob/9a4af263756ff6a1c8abe73e9a51a29f0d8c0533/projects/T5/models/t5_model.py#L129-L134 ） - mt5（projects下的T5）比t5（libai.models中的T5）少了position_embedding，但是mt5中的attention多出了position_bias的相关计算（https://github.com/Oneflow-Inc/libai/blob/e9ca4087cb35b3ad268534ee60456db689e36063/projects/T5/models/attention.py#L272 和 https://github.com/Oneflow-Inc/libai/blob/e9ca4087cb35b3ad268534ee60456db689e36063/projects/T5/models/attention.py#L320 ） - mt5（projects下的T5）不包含任何bias. (Linear 和 LayerNorm) - mt5（projects下的T5）因为要对齐huggingface的版本，没有用到t5（libai.models中的T5）当中的一些优化的地方，比如scale_mask_softmax_fusion，（mt5: https://github.com/Oneflow-Inc/libai/blob/9a4af263756ff6a1c8abe73e9a51a29f0d8c0533/projects/T5/models/attention.py#L232-L244 t5: https://github.com/Oneflow-Inc/libai/blob/9a4af263756ff6a1c8abe73e9a51a29f0d8c0533/libai/layers/attention.py#L214-L250 ） -...

[Bug][MT5] Throughput is unexpected

> @xiezipeng-ML 这里说的hugging face版本的T5指的是 transformers 库的吗？如果是的话，直接支持transformers里面T5的oneflow后端之后，你觉得可以直接跑分布式训练吗？我上周移植了transformers的CLIP的infer，不知道训练会多多少东西。transformers的CLIP和t5应该会共用一些基础的模块吧。是的 transformers仓库，我slack请教你