LLMs-from-scratch
LLMs-from-scratch copied to clipboard
Probably a typo in multi-head attention description (3.6.1 Stacking multiple single-head attention layers)
Hi @rasbt,
I found the following statement in the mentioned section:
Figure 3.24 illustrates the structure of a multi-head attention module, which consists of multiple single-head attention modules, as previously depicted in Figure 3.24, stacked on top of each other.
Did you mean Figure 3.18 in the second case?
Thank you.