Zipeng Xie comments

Results 37 comments of


                                            Zipeng Xie

Test mt5

t5： ```bash bash tools/train.sh tools/train_net.py configs/t5_large_pretrain.py 8 ``` mt5 ```bash bash tools/train.sh tools/train_net.py configs/t5_large_pretrain_xzp.py 8 ```

Test mt5

> 为了避免 PR 过大，这个 PR 在 Loss 和相对性能正常的情况下，可以先推动合并好的

设置不同的data_parallel_size导致了不同的global_batch_size

设置data_parallel_size=2，tensor_parallel_size=1，gpu=2时也就是2卡数据并行，然后train_micro_batch_size=16的话，global_batch_size就是32，也就是32个batch的数据切两份，然后模型其实训练的是32batch_size的数据设置data_parallel_size=1，tensor_parallel_size=2，gpu=2，train_micro_batch_size=16时也就是2卡模型并行，模型训练的数据其实是16的batch_size 一鹏的意思是让用户控制模型应该训练多大的batch，而不是设置每张gpu上的batch_size对吧然后我跟程鹏确认了一下，我也查了一下pytorch的数据并行是直接给用户设置的global_batch_size，LiBai中是都可以设置，但是确实可以优化一下（设置global_batch_size时不设置train_micro_batch_size会出问题），感觉可以global_batch_size和train_micro_batch_size都默认为None，然后用户只设置一个参数的话就自动计算出另一个，都设置了的话就判断是否合理。然后configs文件中的comment里已经给用户解释了这些关系，https://github.com/Oneflow-Inc/libai/blob/5d5acf9aa69ab5b5da8ae9d992dce4afe0d1964c/configs/common/train.py#L22。 document：https://libai.readthedocs.io/en/latest/tutorials/basics/Config_System.html?highlight=train#train global_batch_size=8的时候设置一下train_micro_batch_size=None就好了 @Yipeng1994

DETR结果对齐实验记录

> 参考[Oneflow-Inc/OneTeam#779](https://github.com/Oneflow-Inc/OneTeam/issues/779) 做模型loss对齐的记录 > > * [x] 检查网络结构model.py是否对齐 > * [x] 确定dataloader的shuffle有没有关掉 > * [x] 网络的dropout有没有关掉 > * [x] 确定lr_scheduler和optimizer是否相同 > * [x] 为了双重保险, 可以把传参里面的dropout_prob全部设置为0, 同时把model的mode设置为.eval(), 这样在训练的时候可以保证模型的dropout和bn等op全部都是固定的, 不包含随机性 > >...

Add gelu fast activation

![image](https://user-images.githubusercontent.com/53039617/185396839-a71ff59e-dbcf-442c-b693-d60104d51091.png)

Add gelu fast activation

> 这个和 torch.nn.GELU(approximate='tanh') 是重复的吗 https://pytorch.org/docs/stable/generated/torch.nn.GELU.html > > 相关讨论：[huggingface/transformers#15397](https://github.com/huggingface/transformers/issues/15397) 大老师，这个和 torch.nn.GELU(approximate='tanh')不是重复的，但是都是gelu的近似，gelu的变种有点多： ```python class NewGELUActivation(nn.Module): """ Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT). Also...

Zipeng Xie

Test mt5

Test mt5

设置不同的data_parallel_size导致了不同的global_batch_size

DETR结果对齐实验记录

Add gelu fast activation

Add gelu fast activation

Add gelu fast activation

add mse_loss and ls_loss interface

add mse_loss and ls_loss interface

Refactor dataloader rdma