LLMs-from-scratch
LLMs-from-scratch copied to clipboard
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
i just check out the code of appendix-A/01_main-chapter-code /DDP-script.py,how about adding ``` from torch.profiler import profile with profile() as prof: #the main function training code if rank == 0: print("exporting...
There appears to be an issue when running the code from chapter 6 (other sections not tested): ## Error ``` Traceback (most recent call last): File "/home/user/workspace/project/llm/tune_incl.py", line 359, in...
* ch05_02, Row 4: `python additional-experiments.py --trainable_layers two_last_blocks` --> last_two_blocks * ch05/06: fixed minor typos
This pull request adds support for running inference on Habana Gaudi (HPU) processors by introducing a new directory dedicated to Gaudi-specific implementation. It includes setup instructions, scripts for downloading GPT-2...
## Proposal I’d like to add a new section at the end of Chapter 06, “Deploy on Streamlit Community Cloud,” which walks readers through: 1. Uploading their trained model to...
Fixes for several issues in the package * fixes #675; code for the `encode` function has been taken from `ch05\07_gpt_to_llama\converting-llama2-to-llama3.ipynb` (please double-check if everything is okay) * fixes `tqdm` import...
Could you provide more details about how you determined the context length? I found this information: * The 0.6b model seems to support only 32k (`32,768`) tokens https://qwenlm.github.io/blog/qwen3/#introduction https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/README.md
### Bug description I noticed something small while looking at the Llama 3 tokenizer code and thought it might be helpful to mention: https://github.com/rasbt/LLMs-from-scratch/blob/ece59ba58768db7b34d9b5d5f88677de8c1e84ea/pkg/llms_from_scratch/llama3.py#L315-L316 and https://github.com/rasbt/LLMs-from-scratch/blob/ece59ba58768db7b34d9b5d5f88677de8c1e84ea/pkg/llms_from_scratch/llama3.py#L325-L326 In VS Code, the...