pretrain topic
What-I-Have-Read
Paper Lists, Notes and Slides, Focus on NLP. For summarization, please refer to https://github.com/xcfcode/Summarization-Papers
nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
BERT-CCPoem
BERT-CCPoem is an BERT-based pre-trained model particularly for Chinese classical poetry
CLUECorpus2020
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
MatDGL
MatDGL is a neural network package that allows researchers to train custom models for crystal modeling tasks. It aims to accelerate the research and application of material science.
RE-Context-or-Names
Bert-based models(BERT, MTB, CP) for relation extraction.
albert-mongolian
ALBERT trained on Mongolian text corpus
SparK
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling...