cailinhang
cailinhang
> What dose the coefficient 1000.0 represent? Is it the number of classes ? I guess that softmax(-1000) ≈ 0. Thus -1000 is just used to make the probability of...
I met the same inference error after I finetune the Qwen-7b. The inference error message is ``` LLM says: Eval Error ``` when I add `--debug` the the command, ```...
> The `cnets.py` file [here](https://github.com/SafeAILab/EAGLE/tree/main/eagle/model) caters to `eagle3` model training and `cnets1.py` for `eagle1/2` model training. We can adjust our scripts accordingly. If I want to train egale2 model for...
同样遇到类似的爆内存的情况,通过free -g 发现 在加载checkpoint的时候,可用内存快速下降到0。8个gpu同时load ckpt,可能会导致 内存爆炸。 我想到一个方法,在 `initialize.py` 的 `initialize_model_and_tokenizer()` 函数 里面 加载每一个 gpu_i 的 checkpoint时,通过 time.sleep(i//4 * 120) 来使得 先加载0-3号gpu的 ckpt,间隔 120s 之后 再加载 4-7号gpu的 ckpt,这样应该就能加载成功了。 ```python for i...
For qwen2-7b-instruct, I found that the inconsistency is due to the `repetition_penalty=1.05` in the https://huggingface.co/Qwen/Qwen2-7B-Instruct/blob/main/generation_config.json . Add `repetition_penalty=1.0` Solve my problem. ```python with torch.no_grad(): base_output_ids = base_model.generate( input_ids, attention_mask=attention_mask, temperature=1e-7,...