projected-attention-layers topic

List projected-attention-layers repositories

BERT-Multitask-learning

15
Stars
3
Forks
Watchers

Multitask-learning of a BERT backbone. Allows to easily train a BERT model with state-of-the-art method such as PCGrad, Gradient Vaccine, PALs, Scheduling, Class imbalance handling and many optimizati...