Grey4sh comments

Results 12 comments of


                                            Grey4sh

NCCL error

same issue in cuda 12.1, torch 2.1.1 + cu121, did u solve it ?

CodeQwen1.5 not working

finish it by change config.json & tokenizer.json [https://huggingface.co/Qwen/CodeQwen1.5-7B-Chat/commit/91ffe86a74d00f76a75371d58a70ae5fe1bc0f29](url)

> To further this, the tokenizer config path is not passed into the models. This make a custom config impossible https://github.com/huggingface/text-generation-inference/issues/1785#issuecomment-2131148602

deepseek coder Prompt template

You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and...

[REQUEST] Add Llama 3 Instruct chat template

> I'm unable to fork the repo and push a commit ATM but until then @ddh0 give this a try. > > ```dotenv > system\n\n{{preprompt}}{{#each messages}}{{#ifUser}}user\n\n{{content}}assistant{{/ifUser}}{{#ifAssistant}}{{content}}{{/ifAssistant}}{{/each}} > ``` Thx, it's...

预训练codeqwen1.5-7b时显存分布异常，训练一段时间后爆OOM

> 如果别的模型在本框架下没有出现显存不均匀的问题，那么可能是模型架构导致建议尝试不同的 zero stage 和 batchsize 选择 #3663 #3631 #3310 #2908 看了这些issue提出过类似的问题，都是比较新的模型+新版本训练框架在训练一段时间过后出现OOM情况，希望这个问题能重视下。

预训练codeqwen1.5-7b时显存分布异常，训练一段时间后爆OOM

> 我也碰到类似问题，使用一张A400 80G LoRA微调 Qwen 14B，一段时候后就OOM了。按理说，LoRA微调14B，只需要40G左右显存。另外使用llama fatcory的webchat参数，在A40 48G上推理Qwen 14B，推理一段时间后也OOM。我怀疑缓存没有及时清理。看起来Qwen系列的模型是重灾区啊

预训练codeqwen1.5-7b时显存分布异常，训练一段时间后爆OOM

> 我也遇到了这个问题。Mistral-7b-instruct-v0.2 在 4*4090 训练一段时间后 OOM，sft lora。 > > ``` > 82%|████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 7060/8660 [5:05:53 [rank3]: File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llamafactory/launcher.py", line 9, in > [rank3]: launch() > [rank3]: File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llamafactory/launcher.py", line 5,...

[Feature]: deepseek-v2 awq support

[Bug] deploy the DeepSeek-R1-awq get jumbled or nonsensical answers

请问你这个部署方案短prompt的generate速度大约是多少？