finetune-transformer-lm
finetune-transformer-lm copied to clipboard
Universal Transformer as base architecture
Hello,
First, I would like to thank the authors of this paper for releasing their source code.
Is there a plan to use the same approach using a Universal Transformer as base architecture? Would the adaptive computation time (ACT) mechanism transfer to other tasks?
And more importantly, if this new transformer can be used, do you think the gain would be noticeable?