annotated-transformer issues

Are all Multi-Head Attention "Masked" really?

On the Figure, there `Multi-Head Attention`s and `Masked Multi-Head Attention`s. Are all Multi-Head Attention "Masked" really? ![image](https://github.com/user-attachments/assets/02b878a5-36ad-4d4f-8a34-51efe3084202)

kuraga

Wheel no longer supported by Google Colab

1

I downloaded the notebook from the Colab link in the readme. I uploaded it to a fresh Colab environment. I ran the first cell and got: > ERROR: torch-0.3.0.post4-cp36-cp36m-linux_x86_64.whl is...

ystoneman

🚀 Introducing An Updated Implementation of The Annotated Transformer (2025)

Hi all, First, I want to express my gratitude for the amazing work on The Annotated Transformer. It has been an invaluable resource for the AI community and a fantastic...

yflyzhang

Code Bug

https://github.com/harvardnlp/annotated-transformer/blob/debc9fd747bb2123160a98046ad1c2d4da44a567/the_annotated_transformer.py#L868 Is there a problem with the parameters passed in here? It should be **out = test_model.decode(ys,memory, src_mask, subsequent_mask(ys.size(1)).type_as(src.data))** , not **out = test_model.decode(memory, src_mask, ys, subsequent_mask(ys.size(1)).type_as(src.data))**

MrSchnappi

Could not find a version that satisfies the requirement torchtext==0.12

1

when run !pip install -q torchdata==0.3.0 torchtext==0.12 spacy==3.2 altair GPUtil !python -m spacy download de_core_news_sm !python -m spacy download en_core_web_sm it returns: ERROR: Could not find a version that satisfies...

zhifine

Multiplication positional encoding seems to work better than the original division one?

1

Thank you for providing such a well-organized and comprehensive Transformer tutorial. As a beginner, I’ve learned a lot from this repository☺️! When I was building the positional encoding block, I...

Mightlaus

annotated-transformer
annotated-transformer copied to clipboard

Metadata

Are all Multi-Head Attention "Masked" really?

Wheel no longer supported by Google Colab

🚀 Introducing An Updated Implementation of The Annotated Transformer (2025)

Code Bug

Could not find a version that satisfies the requirement torchtext==0.12

Multiplication positional encoding seems to work better than the original division one?

← Metadata

Owner

Metadata

annotated-transformer annotated-transformer copied to clipboard

Metadata

Are all Multi-Head Attention "Masked" really?

Wheel no longer supported by Google Colab

🚀 Introducing An Updated Implementation of The Annotated Transformer (2025)

Code Bug

Could not find a version that satisfies the requirement torchtext==0.12

Multiplication positional encoding seems to work better than the original division one?

← Metadata

Owner

Metadata

annotated-transformer
annotated-transformer copied to clipboard