Junyang Lin
Junyang Lin
They are of same code and same model arch. Check the generation length of them. For chat models, it will automatically stop with the generation of im_end token, so probably...
Check the latest finetuning script, or Llama factory, or Axolotl
use awq but not gptq for vllm. https://huggingface.co/Qwen/Qwen1.5-32B-Chat-GPTQ-Int4
https://qwenlm.github.io/blog/qwen1.5/ this blog tells you what we have done for Qwen1.5. no idea why length has such an impact. perhaps the previous masking strategies matter or there are other factors...
check qwen1 you can get some results. for qwen1.5 we didn't provide this. we internally evaluate the quantized models on benchmark datasets such as mmlu, ceval, gsm8k, humaneval. quantized models...
This is a recent known issue perhaps caused by vLLM. We are now reporting this issue to the official github repo. Temporarily we advise you to use AWQ model and...
No idea. We did not have experience in AMD. But since our codes are merged in HF, you can turn to check for the AMD support for transformers instead
No idea, and btw bf16 is not supported in V100
Thanks for your contribution very much (and sorry for my late response)! I'll check it as soon as possible.
Yes, for muge, we only finetune with the official training data.