bigbird
bigbird copied to clipboard
Are encoder and decoder both implemented with sparse attention? How long is the verified output length for the decoder?