finetune-transformer-lm Universal Transformer as base architecture

Universal Transformer as base architecture

Open rodgzilla opened this issue 6 years ago • 0 comments

Hello,

First, I would like to thank the authors of this paper for releasing their source code.

Is there a plan to use the same approach using a Universal Transformer as base architecture? Would the adaptive computation time (ACT) mechanism transfer to other tasks?

And more importantly, if this new transformer can be used, do you think the gain would be noticeable?

Sep 04 '18 09:09 rodgzilla

finetune-transformer-lm finetune-transformer-lm copied to clipboard

Universal Transformer as base architecture

finetune-transformer-lm
finetune-transformer-lm copied to clipboard