Louis issues

Results 6 issues of


                                            Louis

### 详细描述问题想请教一下扩充中文词表的必要性，基于unicode编码的tokenizer理论上可以支持任意中文字符编码，那么如果不扩充中文词表，直接使用中文预料进行训练和微调，对模型效果的影响有多大？因为看到一些其他语种比如日语的工作，是没有扩充词表的。wiki里也说了“大多数相关衍生工作是直接在原版上进行pretrain/finetune的”，所以区别只是编码效率或者最终质量问题，是这样吗？ ### 运行截图或log ### 必查项目 - [x] 哪个模型的问题：LLaMA / Alpaca **（只保留你要问的）** - [x] 问题类型：**（只保留你要问的）** - 效果问题 - 其他问题 - [x] 由于相关依赖频繁更新，请确保按照[Wiki](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki)中的相关步骤执行 - [x] 我已阅读[FAQ章节](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/常见问题)并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案 - [x] 第三方插件问题：例如[llama.cpp](https://github.com/ggerganov/llama.cpp)、[text-generation-webui](https://github.com/oobabooga/text-generation-webui)、[LlamaChat](https://github.com/alexrozanski/LlamaChat)等，同时建议到对应的项目中查找解决方案

Slow on V100

Hi teams, I'm fine-tuning with 6 V100 GPUs. The fine-tuning process is extremely slow for me. I'm using fp16 and attn_impl: torch, with a global_train_batch_size of 12 and device_train_microbatch_size automatically...

Inference on nvidia gpu

Thanks for your great work. Im running a mpt model with nvidia v100 gpu. I think the compilation process went well, but GPU cannot be utilized during inference. Here is...

多卡训练lora超时

您好，使用v100进行多卡训练总会遇到超时错误，4卡、2卡均报错。使用单卡似乎没有这种问题但是速度较慢。微调5w数据大约需要12小时。 ```bash [E ProcessGroupNCCL.cpp:828] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=80, OpType=ALLGATHER, Timeout(ms)=1800000) ran for 1805926 milliseconds before timing out. [E ProcessGroupNCCL.cpp:828] [Rank 0] Watchdog caught collective operation timeout:...

pending

位置插值扩展context长度到8k或者32k

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/baichuan-inc/baichuan-7B/issues) and [Discussions](https://github.com/baichuan-inc/baichuan-7B/discussions) that this hasn't already been reported. (+1 or comment...

question

Louis

Training on A100

扩充词表问题

Slow on V100

Inference on nvidia gpu

多卡训练lora超时

位置插值扩展context长度到8k或者32k