sserdoubleh comments

Results 74 comments of


                                            sserdoubleh

为什么train.py比infer.py快的多？

> 在调用train.py时，batch_size可以设为8000左右，且一步用时在200s左右，而调用infer.py时，batch_size只能设的很小，4，12或更小，超过32就可能爆显存。这与平时的直观经验不一致啊。平时eval模式下应该比train模式下更快，占用内存也更小才对啊。请问是什么原因呢？这个是多方面的： 1. batch size，train的时候是按照token算的；infer的时候是按example算的的 2. train的时候只需要forward一次网络，generation的infer是step by step generation，GPU利用率不高，耗时比较大；而且和选用的decoding strategy有关，比如multi sample，或者beam search等，速度会受影响另外你说的一步200s，应该是100个step的耗时吧？

CPU版本paddleHub居然要配置CUDA_HOME什么鬼

看这个报错信息，建议到paddlehub、paddlenlp的repo，提issue https://github.com/PaddlePaddle/PaddleHub/issues https://github.com/PaddlePaddle/PaddleNLP/issues

plato-2优化函数为AdamW，我看lr有对应的衰减策略，但是weight_decay则没有

https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/knover/core/model.py#L398 感觉你理解错了代码？这里传进来的 lr 不是固定 float值，而是paddle 里的一个 variable，是会变化的

plato-2优化函数为AdamW，我看lr有对应的衰减策略，但是weight_decay则没有

LN 一般都是不会参与 weight decay 的可以参考这个讨论：https://discuss.pytorch.org/t/weight-decay-only-for-weights-of-nn-linear-and-nn-conv/114348

For the error of running the command 'bash ./scripts/local/job.sh ./projects/PLATO-2/pretrain/24L_infer.conf'

You need to download the models from this url.

For the error of running the command 'bash ./scripts/local/job.sh ./projects/PLATO-2/pretrain/24L_infer.conf'

Follow this : https://github.com/PaddlePaddle/Knover/blob/develop/projects/PLATO-2/README.md#pre-trained-dialogue-generation-model

For the error of running the command 'bash ./scripts/local/job.sh ./projects/PLATO-2/pretrain/24L_infer.conf'

Dygraph branch cannot support PLATO-2 now.

请问有PLATO-XL的预训练conf吗

训练超参可以参考论文

knover/data/dialog_reader.py里_gen_self_attn_mask函数对unidirecional的情况处理是不是不全

你说的是这个吧？mask_data https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/knover/data/dialog_reader.py#L584 这个是 python 语法的问题，mask_data 是input_mask_data这个 list 的一个元素的引用，修改的 mask_data 也会同步到 input_mask_data 参考： ![图片](https://user-images.githubusercontent.com/6134289/136964688-e9a92574-59ba-40b4-a312-b49903698fbe.png)

训练数据的组织形式

NSP 这一个任务比较简单，同一话题（表现上就是词重复度高的）的回复得分往往比较高，后续可以继续优化