macheng6 issues

Results 16 issues of


                                            macheng6

有无训练数据集的链接

char span和token span分别指什么

中文数据集上match pattern应该用哪个？

Have you considered using falsh attention to speed up?

### Feature request using flash attention to speed up ### Motivation none ### Your contribution none

moss是否用到了RLHF技术

关于训练过程

我想知道，moss训练过程中一下问题： 1. moss是选择哪个模型作为初始化参数（backbone）的？ 2. moss训练过程中用到了哪些优化显存的方法？

Questions about training

I want to know how to avoid oom when fine-tuning the 20B model, only fp16？

4.29.0 bug

### System Info The dp mode of 4.29.0 seems to have a bug. When forwarding, the dtype of the model will be changed to torch.int64, which will cause the torch.finfo...

关于GLM的有以下两个问题？1.为什么predict的时候没有加linear映射到词表维度，而是直接与word_embeddings相乘映射到词表维度了。 2.GLM加载使用AutoModelForSeq2SeqLM，而没有使用AutoModelForCausualLM，原因是什么？

1.为什么predict的时候没有加linear映射到词表维度，而是直接与word_embeddings相乘映射到词表维度了。 2.GLM加载使用AutoModelForSeq2SeqLM，而没有使用AutoModelForCausualLM，原因是什么？

[Question]: When I changed the target token, the code reported an error

### Describe the issue In the LLMLingua project, I attempted to use the qwen model instead of modelname and oai_tokenzier in the code, when the targte token is 150, the...

bug

question