pnlp-mixer
pnlp-mixer copied to clipboard
Questions Regarding the model
- The XL models intent accuracy are ~1% away from mBERT in general, and in some of the language subcategories. Even though on the surface a hypothetical XXL model would be able to parity mBERT (as per comparing L and XL models), is there a possibility that it can have diminishing returns?
- Are there any way of demonstrating the interpretability of Transformer-based models (e.g. BERT and GPT-likes), are there similar mechanisms for the Mixer (since MLP-Mixer visualization exists)? https://jalammar.github.io/illustrated-transformer/ https://medium.com/ml-summaries/mlp-mixer-an-all-mlp-architecture-for-vision-paper-summary-e50fa915e04d
- On a speculative note, when can the model be scaled to parity GPT-Neo or its commercial counterparts?