neoml
neoml copied to clipboard
Machine learning framework for both deep learning and traditional algorithms
Signed-off-by: Pavel Voropaev The behaviour of a single attention head doesn't change with the total number of heads. Just like in the original paper
Signed-off-by: Pavel Voropaev I think it can fix https://github.com/neoml-lib/neoml/issues/757...
https://github.com/neoml-lib/neoml/blob/bbae2779fbce3d757d4a665063a35d8d303b9fef/NeoML/src/Dnn/Layers/PositionalEmbeddingLayer.cpp#L64 In `PET_LearnableAddition` mode this condition is false every time when the sequence has the length different from the previous one. Then `initializeLearnableAddition()` is called, completely resetting the weights without...
Comparison by relation `