huangjf11
huangjf11
想请教下, 在gsm8k数据集上采取完全相同的设置,仅修改精度bf16,fp16 和 学习率 、epoch等 训练均已收敛,存在如下三种效果:(多次实验,很普遍的现象。训练和推理用的是同一个模版)   
ValueError: Transformers now supports natively BetterTransformer optimizations (torch.nn.functional.scaled_dot_product_attention) for the model type llama. As such, there is no need to use `model.to_bettertransformers()` or `BetterTransformer.transform(model)` from the Optimum library. Please upgrade...
The official llama3-8B model of Hugging Face lacks **tokenizer.model** file. Can you help me to solve this issue?
Hello, I would like to ask about the metadata.jsonl in the clip-filtered-dataset. Could you please explain the meanings of the attributes contained in each data sample and some formulas for...
Has anyone encountered this problem before?  
### Reminder - [x] I have read the above rules and searched the existing issues. ### System Info **I'm using the same launch command,** but SFT training works fine while...
训练好模型做测试时,不应该针对输入直接做输出吗?为什么preds还要受trues的影响。 output = model(**batch) labels = batch["labels"].detach().cpu().numpy() logits = output.logits preds = torch.argmax(logits, -1).detach().cpu().numpy() preds = preds[:, :-1] labels = labels[:, 1:] **preds = np.where(labels != -100, preds, tokenizer.pad_token_id) decoded_preds...