functional-transformer
functional-transformer copied to clipboard
Make value heads nonsquare and add back head concatenation
As noted in https://github.com/awf/functional-transformer/discussions/6 the model does not match the original code, or indeed the original transformer paper. I therefore consider this a "transformer variant", but of course it would be sensible to make it match and check if that improves/disimproves performance.