Philip May comments

Results 184 comments of


                                            Philip May

[FR] Use normal dict instead of `**model_kwargs` in `_from_pretrained`

Ahh I see. Very nice to have a 2nd maintainer. :-) Many thanks.

[KTO]: Fix nan losses and crashing job

@kawine reading your plot -> is it possible that you train on multiple different machines with one GPU each? It reads "copper-paper" and "noble-pyramid". I think thats names coming from...

schedulefree optimizers

Should be very easy to test this on Phi-2 or TinyLlama when the implementation works?

schedulefree optimizers

This PR should maybe also add a few lines to the README about "how to use this".

KTO - support loading the adapter twice

Hi @lewtun , we had a discussion about KTO. Do you already work on this or should we come up with a PR? We would try and use the code...

KTO - support loading the adapter twice

That feature would be super useful @claralp . Thanks.

Data License

As far as I know there was a ruling in the US that AI generated content cannot be licensed. In this context, it is questionable from my point of view...

Normalize before using LogisticRegression

this is implemented via #177 - closing this

Why do you freeze batch norm parameters when training?

> Use synchronized batch normalization Using sync batch norm does not help with single GPU training and low batch sizes though.

finetune model for commercial use?

I am not from Facebook or Meta but AFAIK: When you finetune model a to model b and then release model b: The license of model b must match the...