LLMs-from-scratch
LLMs-from-scratch copied to clipboard
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
If I understand correctly, there should be missing single quotes here due to oversight, and they haven't been displayed.
When reading the README.md for this repository, it's not immediately clear what this repository contains or what it is for. I think this should be clarified.
This PR adds basic devcontainer configuration files for docker based development. **Update**: PyTorch version 2.2.0 **Todo**: Add/update docs to refer to docker installation and usage.
Add missing import.
Hi @rasbt, This [notebook](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb) contains the following implementaion of CausalAttention: ```python class CausalAttention(nn.Module): def __init__(self, d_in, d_out, block_size, dropout, qkv_bias=False): super().__init__() self.d_out = d_out self.W_query = nn.Linear(d_in, d_out, bias=qkv_bias) self.W_key...
Hi @rasbt, It seems that cell [36] in the [notebook](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01_main-chapter-code/ch03.ipynb) with main code contains solution to Exercise 3.2. Thank you.
Hi @rasbt, I found the following statement in the mentioned section: > Figure 3.24 illustrates the structure of a multi-head attention module, which consists of multiple single-head attention modules, as...
Hi @rasbt, I am trying to explore and reproduce Chapter 3 and found that I can't reproduce results that you specified in the notebook and the book, even if I...
Hi @rasbt, I noticed that in the book you provide the following code with function name `create_dataloader` and the argument `stride = max_length + 1` to avoid overlap in data...
Added CUDA support information to Docker readme file