open-unmix-pytorch
open-unmix-pytorch copied to clipboard
Use of Transformers/Attention
Hi, I was just wondering if there have been any attempts at using a Transformer instead of the Bi-LSTM or utilizing attention in the network to potentially improve results?