Sebastian Raschka comments

Results 821 comments of


                                            Sebastian Raschka

domain specific fine-tuning

> I wanted to fine-tuning full parameters of gemma model, I noticed there is an example, https://github.com/Lightning-AI/litgpt/blob/main/litgpt/finetune/full.py , can I use this example for domain specific fine-tuning? Yes, this model...

domain specific fine-tuning

Sorry, for the late responses, but it's been a busy week. Regarding the domain-specific finetuning, it would be similar to continued pretraining without instructions in your case, correct? Regarding the...

Full finetuning

Well, maybe ignore the last commit. Lazy loading works now when you use 1 device, but it now fails when using multiple devices and deepspeed. The previous commit without lazy...

Gradients in GPT module of the finetuning/lora.py script are always zero

Thanks for bringing that up! I think `reset_parameters()` will not make the weights 0 though but reinitialize them when I understand correctly. So I think this should be okay but...

Update ch04.ipynb

Thanks for the PR, I appreciate it! However, I think in this notebook importing torch is not necessary as we are working with `import torch.nn as nn`. Did you bump...

Update ch04.ipynb

Ah yes good call. I see that I imported it here:

Update ch04.ipynb

Thanks again for moving the `import torch` line up. I just removed the 2nd import to reduce redundancy.

Question about implementation of CausalAttention class (3.5.3 Implementing a compact causal self-attention class)

Thanks for bringing this up! Regarding removing the `:num_tokens` slicing from ```python self.mask.bool()[:num_tokens, :num_tokens] ``` That's unfortunately not possible like @ahmedDaoudi-u mentioned. E.g., in Ch05, we are using an LLM...

Inconsistencies between the code in the book and the notebooks (2.6 Data sampling with a sliding window)

Good eye for detail. Actually the +1 wasn't necessary so I updated that a while back in the notebook and manuscript. I think you are seeing the old +1 in...

Inconsistencies between the code in the book and the notebooks (2.6 Data sampling with a sliding window)

Ah yes, big thanks for the follow up! I think I may have missed one. I probably did a find+replace looking for `stride=max_length+1` and then missed the one you had...