Sebastian Raschka

Results 628 comments of Sebastian Raschka

> Finetuning my data takes so long (more than 24 hours). In this case, how can I shorten the time? If you are not doing it already, you could try...

Thanks! I think LongLoraArgs might be better, especially if it can be used in multiple approaches, e.g., `full` and `lora`

Nice, this is a good sign that things work!

> What are the other options? Are "wte,norm,ln" the only allowed ones or are there more? In the paper the authors have specified that to increase the context length while...

Could you share the commands you ran, it might be a bit easier to discuss. But in general, I think you could do the following without moving: Finetune model: ```bash...

Sorry for the long silence, and thanks again for this great PR! I have just been a bit swamped with work lately but hopefully can circle back to it some...

Thanks for suggesting and offering to contribute In short, instead of selecting a hard number of samples to like in top k, it selects the number of samples such that...

Thanks for raising that. Maybe it's a HF thing. I will have to investigate.

I could not reproduce it for another model yet when I gave it a quick try. I am not sure if it's related because the differences are so big, but...

Thanks for reporting. This would be because Phi-3 has not been added yet via #1341