Porraio issues

Results 8 issues of


                                            Porraio

Update README.md

进阶书籍推荐： Python Cookbook

[Help] <如何正确的构建input_ids、attention_mask、position_ids和labels>

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 参照GLM论文，假如token1 token2 是source, token3 token4是target 那么训练的时候： input_ids是 [pad, pad, token1, token2,...

Deepspeed zero stage 3

Default deepspeed config for config_block_10B.json is zero-2, when i change it to zero-3, i got a mismatch error. Is there a way to use zero-3 (load param to cpu offload)?...

run_clm进行预训练的疑问

请问预训练的时候，使用packaging模式，多条数据可能会到一起，那么输入是``, `token1`, `token2`,``, `new_token1`, `new_token2`这样吗，不需要加eos_token吗，这样不就不会学到停止符了吗。

stale

[Question]: aquila的tokenizer有100008个tokens,其中有8个special tokens，很多是用不到的吧？

### Description 你好，请问是不是训练过程只用到了token_ start id(100006), token_end_id(100007)。 unk和pad都是0。另外：aquile_ generate里面用的是encode方法，试了下，默认没加start_ token。是正常的吗？正常输入不应该是[start token, tokenl,.. tokenN]这样预测到结束吗？感谢回答。 ### Alternatives _No response_

question

请问有支持starcoder模型的计划吗

Error when deploying inference server with starcoder-gptq

### System Info I tried to quantize for starcoder with the script this repo provided, then deploy it by text-generation-launcher. Got the error when warming up model: Not enough memory...

Stale

插件版本用的是公开的数据集吗

请问插件版本用的哪些数据集呢？是基于base模型加chat版本的SFT数据和插件数据混合训练的吗，还是基于Chat模型再次训练的呢