Adding a Random Encoder for Baseline Runs
NLP Papers often compare against baselines and having a prebuilt random encoder could help with that. A random encoder is similar to a simple encoder with a slight difference here each self-attention sublayer is replaced with two constant random matrices, one applied to the hidden dimension and one applied to the sequence dimension. It is described in the F-Net Paper which was implemented by @abheesht17
One of the reasons why i think it might be worth having is that it's stable in training unlike BERT and if it's scores are lacking it indicates that a structured mixing is required over any random mixing as described in the paper.
@aflah02 Thanks for opening this feature request!
My opinion is that for baseline models, we may probably just stick to BERT/GPT-2/3, since it has been well adopted by NLP community. @mattdangerw @fchollet What do you think?
@chenmoneygithub Thanks again for the review. Yup I totally agree the models you've mentioned are the ones that more often than not are used as baselines. Let's hear from the others too maybe they have some insights on this!