Sebastian Raschka

Results 821 comments of Sebastian Raschka

Thanks for reporting, and huh, that's a weird one, I haven't seen this before. As a sanity check I wonder what happens if you use the generate function to emulate...

Hm, I am not sure why it's slowing down so much in multi-GPU settings. It's speculation, but maybe if the GPUs have a slow connection, then the communication overhead is...

Hi there, if I remember correctly, I was overruled so there's no `--skip_validation` 😅. We could potentially still add it, but at the same time, I'm also curious why this...

Oh hm, my bad. I thought it could have worked. In that case, maybe just skip the validation for now, and we need to revisit and investigate this.

Do you mean calculating the validation and test set losses on a given dataset after finetuning the model? Unfortunately, we don't have a specific functionality implemented for that. But perhaps...

Thanks for the contribution. I don't have a strong opinion on the hardcoding of `name` because in this case it would be the same due to the `if/else name ==`...

A smart, automatic choice would be nice but maybe this should be a feature flag. Maybe something like ``` --optimize smart (default) / memory / flops ``` where - `"memory"`...

This might actually fix my OLMo PR in #927 😅

Yes exactly. I just wrote in the other issue: > Btw I think having something like TinyStories is super valuable for trying things out. The other datasets (1.2T!) are much...