TurboTransformers
TurboTransformers copied to clipboard
Refactor MultiheadAttention and other Layers
Now the logic inside MultiheadAttention Layer is too complex for development. Moreover, some bugs exist in intermediate management. It is the first priority to rewrite these codes to make others easily understand what Turbo is doing.