Sebastian Raschka comments

Results 820 comments of


                                            Sebastian Raschka

Gemma 2: `9b` and `27b` versions

Works great!

Gemma 2: `9b` and `27b` versions

Based on the config file run, the train and val loss look great. It's a surprisingly low MMLU though. There's nothing wrong with the finetuned model though and it works...

Gemma 2: `9b` and `27b` versions

Awesome, this is great! Thanks for this amazing PR!

Chapter 14, MNIST test set plot

Thanks for the note. I just tried it and it both works for me. But yes, you could use `.data` instead. I.e., instead of mnist_test_dataset[i][0][0, :, :] you could use...

Chapter 14, MNIST test set plot

I added this as an alternative code line to Ch 14 in case others have the same issue.

processing the dataset.

Good point. Does the LitData section here help? https://github.com/Lightning-AI/litdata?tab=readme-ov-file#1-prepare-your-data

processing the dataset.

Personally, I use the `TextFiles` approach that I've implemented in LitGPT. But going back to an earlier comment you had, (and the phrase in the docs), my colleagues don't recommend...

Fix bug in masking when kv cache is used.

Thanks for updating the masking. I just added some tests to make the equivalent easier... it looks like the updated masking now creates a mismatch between the base model and...

Fix bug in masking when kv cache is used.

I may have to rethink this when my brain is a bit fresher tomorrow morning, but I think the original code is correct because we don't recompute the older tokens,...

Fix bug in masking when kv cache is used.

Thanks for the suggestion. I think it's a good idea here to make the code more explicit.