cailinhang comments

Results 5 comments of


                                            cailinhang

About the coefficient 1000.0 in DKD.py

> What dose the coefficient 1000.0 represent? Is it the number of classes ? I guess that softmax(-1000) ≈ 0. Thus -1000 is just used to make the probability of...

Error in ChatGLM inference

I met the same inference error after I finetune the Qwen-7b. The inference error message is ``` LLM says: Eval Error ``` when I add `--debug` the the command, ```...

How to train eagle3 and support qwen2

> The `cnets.py` file [here](https://github.com/SafeAILab/EAGLE/tree/main/eagle/model) caters to `eagle3` model training and `cnets1.py` for `eagle1/2` model training. We can adjust our scripts accordingly. If I want to train egale2 model for...

量化成int4后爆cpu内存

同样遇到类似的爆内存的情况，通过free -g 发现在加载checkpoint的时候，可用内存快速下降到0。8个gpu同时load ckpt，可能会导致内存爆炸。我想到一个方法，在 `initialize.py` 的 `initialize_model_and_tokenizer()` 函数里面加载每一个 gpu_i 的 checkpoint时，通过 time.sleep(i//4 * 120) 来使得先加载0-3号gpu的 ckpt，间隔 120s 之后再加载 4-7号gpu的 ckpt，这样应该就能加载成功了。 ```python for i...

Inconsistent response of Qwen2-7B-Instruct and LLaMA3-8B-Instruct in EAGLE and transformers

For qwen2-7b-instruct, I found that the inconsistency is due to the `repetition_penalty=1.05` in the https://huggingface.co/Qwen/Qwen2-7B-Instruct/blob/main/generation_config.json . Add `repetition_penalty=1.0` Solve my problem. ```python with torch.no_grad(): base_output_ids = base_model.generate( input_ids, attention_mask=attention_mask, temperature=1e-7,...