Biały Wilk comments

Results 21 comments of


                                            Biały Wilk

[BUG/Help] <title> Qlora不支持么？

> > @Coder-nlper 我这里看是有的 ![1688393066209](https://user-images.githubusercontent.com/19722714/250579446-63296d90-27e6-46f2-af2b-45c732f17a58.png) > > 问一下，图片中哪些层是被量化的层，名字有提示吗？能打印出来的层，都是被量化为nf4的

commands to build: hf_model_path=/root/chatglm3-6b/ engine_dir=/root/trtllm/trtllmmodels/fp16/ CUDA_ID="1" CUDA_VISIBLE_DEVICES=$CUDA_ID python3 build.py \ --model_dir $hf_model_path \ --log_level "info" \ --output_dir $engine_dir/1-gpu \ --world_size 1 \ --tp_size 1 \ --max_batch_size 50 \ --max_input_len 2048...

a compare with vllm 0.2.7

Driver is 470.141.10. is there any relationship with it?

a compare with vllm 0.2.7

> > Driver is 470.141.10. is there any relationship with it? > > I have to imagine that likely isn't ideal. > > While it's supported per the [Nvidia Frameworks...

llama awq4 result is wrong!

I also face the same issue. the reason may as follows: https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/llm_ptq/README.md#model-support-list

Ada `FP8xint4` performance issue

> To be more precise, all weight-only kernels use `fp32` accumulation, regardless of the arch version. > We have done extensive experiments on `L20` and `L40s`, where `fp8xint4` is better...

Ada `FP8xint4` performance issue

> @x-transformers did you test error rate through pytorch or trtllm? I suggest you have a first try on accuracy verification on pytorch, if it turns fine then it's kernel's...

[Badcase]: 相同的数据，微调时在qwen2.5 72B预训练模型上的loss是qwen2 72B的3倍，请问2.5除了数据变多了，其他有什么不一样吗

我本地微调也遇到同样问题。使用Qwen1.5-4B微调模型，loss、梯度正常下降使用Qwen2.5-3B，loss降不下去

[Badcase]: 相同的数据，微调时在qwen2.5 72B预训练模型上的loss是qwen2 72B的3倍，请问2.5除了数据变多了，其他有什么不一样吗

基座模型，我使用了 text 这样的格式。发现是不能收敛的。更换成text，这样训练就正常了

[Badcase]: 相同的数据，微调时在qwen2.5 72B预训练模型上的loss是qwen2 72B的3倍，请问2.5除了数据变多了，其他有什么不一样吗

> If you use LoRA on the base models, you need to finetune the embeddings and the lm_head if you use the ChatML template. 使用ChatML方式。 Qwen2.5-3B，是使用共享embedding的方式。lora的 modules_to_save=["embed_tokens"]，还是不能收敛。 modules_to_save=["embed_tokens", "lm_head"]是可以收敛的，但是3B里面就没有lm_head这一层呀。请问这种是什么问题导致的？Qwen1.5-4B里面，没有这个问题...