transformer
transformer copied to clipboard
About share weight between embeddings
In the original paper it says that the two embedding layers share weights, but I fail to find any implemention about share weight in the code.