Sebastian Raschka comments

Results 821 comments of


                                            Sebastian Raschka

Finetuning run times out at evaluation step on multiple devices

Thanks for reporting, and huh, that's a weird one, I haven't seen this before. As a sanity check I wonder what happens if you use the generate function to emulate...

Finetuning run times out at evaluation step on multiple devices

Hm, I am not sure why it's slowing down so much in multi-GPU settings. It's speculation, but maybe if the GPUs have a slow connection, then the communication overhead is...

Finetuning run times out at evaluation step on multiple devices

Hi there, if I remember correctly, I was overruled so there's no `--skip_validation` 😅. We could potentially still add it, but at the same time, I'm also curious why this...

Finetuning run times out at evaluation step on multiple devices

Oh hm, my bad. I thought it could have worked. In that case, maybe just skip the validation for now, and we need to revisit and investigate this.

Finetuning run times out at evaluation step on multiple devices

Do you mean calculating the validation and test set losses on a given dataset after finetuning the model? Unfortunately, we don't have a specific functionality implemented for that. But perhaps...

Finetuning run times out at evaluation step on multiple devices

This might also be interesting: #1383

Correct an apparent logger output directory bug

Thanks for the contribution. I don't have a strong opinion on the hardcoding of `name` because in this case it would be the same due to the `if/else name ==`...

Smart choice of the inference algorithm

A smart, automatic choice would be nice but maybe this should be a feature flag. Maybe something like ``` --optimize smart (default) / memory / flops ``` where - `"memory"`...

Drop interleave placement in QKV matrix

This might actually fix my OLMo PR in #927 😅

Add TinyStories to the pretraining docs

Yes exactly. I just wrote in the other issue: > Btw I think having something like TinyStories is super valuable for trying things out. The other datasets (1.2T!) are much...