annotated_deep_learning_paper_implementations
annotated_deep_learning_paper_implementations copied to clipboard
๐งโ๐ซ 60 Implementations/tutorials of deep learning papers with side-by-side notes ๐; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gan...
What is the main purpose or outcome of this training? is it only for text completion? How do I finetune it for Question-Answering or any other downstream task? I want...
I noticed that the Chinese translation appears to be machine-generated. Do you have any plans to officially translate this excellent project? If needed, I am willing to take on the...
The title explains itself. Here are some relevant resources: - Official Mamba repository: [https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py](https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py) - Original paper: [https://arxiv.org/abs/2312.00752](https://arxiv.org/abs/2312.00752) - Additional resources: - [A Visual Guide to Mamba and State Space...
When training with the celeba dataset, the program encountered an error. 
LORA
An implementation of LORA and other tuning techniques would be nice.
Hi, thank you for your work. I noticed an error in the RoPE inner product equation. Additionally, this implementation uses a different feature pairing strategy for feature subspaces rotation compared...
I wonder why array shapes in aha are (C, B, D) rather than (B, C, D). I thought it was convention that the batch was the first dimension. Specially, here...
Hello, Iโve been learning various AI/ML-related algorithms recently, and my notes are quite similar to the content of your repository. Also this excellent work has helped me understand some of...
PLEASE CORRECT ME IF IM WRONG. I believe the line ` attn = attn.softmax(dim=2)` is incorrect. https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/05321d644e4fed67d8b2856adc2f8585e79dfbee/labml_nn/diffusion/ddpm/unet.py#L188 Dim 1 contains the index (i) over the query sequence entries, and dim...
