Sebastian Raschka comments

Results 818 comments of


                                            Sebastian Raschka

% character when streaming

Thanks, in this case, I'd say it's a feature not a bug!

Breast image classification using deep learning

Hi there, unfortunately, there is no one-size-fits-all solution. Often, the biggest improvement can be made by improving the dataset (collecting more samples, cleaning the data, etc.). Then, algorithm selection and...

LitData doesn't support s3 bucket connection outside server

Could you provide the concrete code snippets and file paths (and studio names) to illustrate this to @tchaton with a concrete example to follow @sanyalsunny111

Qwen series

Unfortunately I currently don't have the capacity to work on this, but if someone wants to work on it, PRs are very welcome!

bnb optimizers could use bnb.nn.StableEmbedding instead of torch.nn.Embedding

Thanks for the note and good point, I didn't know about this. One challenge I see with configuring it in the config file is that it's used to model creation....

bnb optimizers could use bnb.nn.StableEmbedding instead of torch.nn.Embedding

Upon reading a bit more, this would only be required for training (due to the optimizer choice). I added it in #1770

Gemma 2: `9b` and `27b` versions

Nice summary. I think this touches all the main points. The others (knowledge distillation for the small models; tied embeddings) would not affect the architecture, it's more of a pretraining...

Gemma 2: `9b` and `27b` versions

> @Andrei-Aksionov [Sliding window attention (an ugly one, but hey, it works)](https://github.com/Lightning-AI/litgpt/pull/1545/commits/889049df4885cfdfd892ea8f54fa22d3456e5a44) Cool! We can also add that to the existing Mistral/Mixtral models then 😊

Gemma 2: `9b` and `27b` versions

> I believe only Mistral v0.1 supported sliding window attention, all the subsequent models by Mistral.ai don't use it. I think you are right. > But after this PR is...

Gemma 2: `9b` and `27b` versions

> One more thing. Due to time constraints, I didn't test Gemma v2 27b version. Tests are running fine, but it would be nice to check the generated output. @rasbt...