chenhuixi comments

Repositories
Issues
Comments

Results 2 comments of


                                            chenhuixi

出现如下warning: tried to get lr value before scheduler/optimizer started stepping, returning lr=0

Perhaps the batch size is set so large that it lead to “CUDA out of memory”, but the program does not report an error. Try to make the ”train_micro_batch_size_per_gpu“ parameter...

为什么chatglm1和chatglm2的get_masks的实现不一样？即对应的attention_mask的实现方式不同

chatglm是prefix llm，chatglm2是causal llm。 llama系列是causal llm。