Benjamin Bossan
Benjamin Bossan
Thanks a lot for the PR. I haven't checked the details yet, but regarding your testing question, the missing piece was that you need to set the seed for each...
@efraimdahl are you still working on this?
@efraimdahl No hurry, I was just checking, as sometimes people just forget about their PRs. No need to close this one. As to separating PRs, yes, it's always a good...
Thanks for the PR. For others, the context is issue #1982. Honestly, I'm unsure if adding the scale here is correct or not. @fxmeng it would be great if you...
It is very hard to give general recommendations based on what you report. I think your main intuition to increase the learning rate when the batch size is increased is...
I can give you a couple of hints but at the end of the day, the only way I could really help is if you gave me access to the...
> 2\. The base model is Llama3 8b In this case, you could try to also target the gate projections of the MLP part. Just pass `target_modules="all-linear"` and you should...