Sebastian Raschka comments

Results 628 comments of


                                            Sebastian Raschka

[question] how to finetune efficiently

> Finetuning my data takes so long (more than 24 hours). In this case, how can I shorten the time? If you are not doing it already, you could try...

Add LongLora for both full and lora fine-tuning

Thanks! I think LongLoraArgs might be better, especially if it can be used in multiple approaches, e.g., `full` and `lora`

Add LongLora for both full and lora fine-tuning

Nice, this is a good sign that things work!

Add LongLora for both full and lora fine-tuning

> What are the other options? Are "wte,norm,ln" the only allowed ones or are there more? In the paper the authors have specified that to increase the context length while...

[question] how to finetune efficiently

Could you share the commands you ran, it might be a bit easier to discuss. But in general, I think you could do the following without moving: Finetune model: ```bash...

Add LongLora for both full and lora fine-tuning

Sorry for the long silence, and thanks again for this great PR! I have just been a bit swamped with work lately but hopefully can circle back to it some...

Nucleus (top-p) sampling

Thanks for suggesting and offering to contribute In short, instead of selecting a hard number of samples to like in top k, it selects the number of samples such that...

Failed to load the finetuned model with `AutoModelForCausalLM.from_pretrained(name, state_dict=state_dict)`

Thanks for raising that. Maybe it's a HF thing. I will have to investigate.

Failed to load the finetuned model with `AutoModelForCausalLM.from_pretrained(name, state_dict=state_dict)`

I could not reproduce it for another model yet when I gave it a quick try. I am not sure if it's related because the differences are so big, but...

'Phi-3-mini-4k-instruct' is not a supported config name

Thanks for reporting. This would be because Phi-3 has not been added yet via #1341