x-transformers
x-transformers copied to clipboard
Question about best combinations of features
Dear Author,
Thanks for your excellent work! I want to try your implementation for language translation related task. I have two questions and I'd appreciate your help very much:
- You implemented many features to improve performance. Which features can be combined together?
- You mentioned that small initialization of embeddings is taken care of if l2norm flag is set. What is your overall recommendation of weight initialization for the whole model?
Thanks!