Arabert
Arabert copied to clipboard
Arabic Language Model based on Bert
Corpora
Here we combine all the datasets we can collect - [OSCAR's CommonCrawl Dataset](https://traces1.inria.fr/oscar/) - [Arabic BERT Corpus](https://www.kaggle.com/abedkhooli/arabic-bert-corpus) - [Hindawi](https://www.hindawi.org/books/) - [ArabicWeb16](https://sites.google.com/view/arabicweb16/home) - [OPUS](http://opus.nlpl.eu/) - [Wikimedia](https://dumps.wikimedia.org/)
A collection of free TPU compute - [TensorFlow Research Cloud](https://www.tensorflow.org/tfrc) - [Google Colab](http://colab.research.google.com) - [Kaggle Kernels](https://www.kaggle.com/kernels)
Tutorials for training Bert - [How to train a new language model from scratch using Transformers and Tokenizers](https://huggingface.co/blog/how-to-train) - [Training BERT Language Model From Scratch On TPUs](https://www.youtube.com/watch?v=s-3zts7FTDA&feature=youtu.be)