x-transformers
x-transformers copied to clipboard
Question about Over-Smoothing problem?
I wonder if you @lucidrains, have any suggestions for the over-smoothing problem with Transformer models (both decoder and encoder).
@Hosein47 do you have experiments setup to measure oversmoothing? part of me wonders if it is even a problem worth solving, given chatgpt has shown scale and data matters way more than architectural tweaks
however, if you have the right setup, i'd be willing to build in a solution i've seen from GNN literature, provided you run the experiments and share your plots here
try https://github.com/lucidrains/x-transformers#gated-residual for starters, and if you see it alleviate oversmoothing, i can add a simpler technique