xalss issues

Results 8 issues of


                                            xalss

attention module masked_fill value error

In the attention.py file of modules, line 53 is the process of calculating the attention weight in the following picture. The value of mask filling should be -1e7, right?

bug

Is there anything to pay attention to during the back translation process

Is there a strict procedure for back translation? I used fairseq's ende transformer pre-training model to get back translation data for training uda, but I can’t get a good result....

关于模型结构和 kw_mask

## 1. 模型结构看论文中的描述，关键字注意力层和常规 transformer 层分别接在 11 层常规 transformer 之后，但是看源码中，貌似并不是这样，也就是 modeling.py 的第 212、226 行，类似于一个双塔结构，它们共享的只有 embedding 层？ ## 2. kw_mask attention 在生成这个 mask 的过程中，cls 和 sep 三行中如果不经过特殊处理应该在进入 softmax 之前全部被填充成 -10000，那这三行在进行 softmax...

为什么ner中用的crf有好几个版本？

How to understand this note: "note: Since Deepspeed-ZeRO can process multiple generate streams in parallel its throughput can be further divided by 8 or 16 ..."

Why is it said that only ds_zero is currently doing world_size streams on world_size gpus, while acclerate and ds inference should be doing the same as well since they also...

xalss

attention module masked_fill value error

Is there anything to pay attention to during the back translation process

关于模型结构和 kw_mask

为什么ner中用的crf有好几个版本？

How to understand this note: "note: Since Deepspeed-ZeRO can process multiple generate streams in parallel its throughput can be further divided by 8 or 16 ..."

请教一下，指令微调baichuan13，训练后期出现重复\n问题

utils.py 第 98 行处理需要删掉

关于reward model的权重合并问题