Philip May
Philip May
Ahh I see. Very nice to have a 2nd maintainer. :-) Many thanks.
@kawine reading your plot -> is it possible that you train on multiple different machines with one GPU each? It reads "copper-paper" and "noble-pyramid". I think thats names coming from...
Should be very easy to test this on Phi-2 or TinyLlama when the implementation works?
This PR should maybe also add a few lines to the README about "how to use this".
Hi @lewtun , we had a discussion about KTO. Do you already work on this or should we come up with a PR? We would try and use the code...
That feature would be super useful @claralp . Thanks.
As far as I know there was a ruling in the US that AI generated content cannot be licensed. In this context, it is questionable from my point of view...
this is implemented via #177 - closing this
> Use synchronized batch normalization Using sync batch norm does not help with single GPU training and low batch sizes though.
I am not from Facebook or Meta but AFAIK: When you finetune model a to model b and then release model b: The license of model b must match the...