Jianbin Chang

Results 2 repositories owned by Jianbin Chang

shu

145
Stars
29
Forks
Watchers

中文书籍收录整理, Collection of Chinese Books

c4-dataset-script

115
Stars
13
Forks
Watchers

Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on CommonCrawl data processing. Including Chinese data processing and cleaning methods in MassiveText.