RedPajama-Data
RedPajama-Data copied to clipboard
where is the FastText ptrtrained model to classify each CommonCrawl webpage
First of all: thank you very much for your contribution!
Many thanks if you can share the FastText ptrtrained model to classify each CommonCrawl webpage whether it is low quality page
You can download it here: https://fasttext.cc/docs/en/language-identification.html
You can download it here: https://fasttext.cc/docs/en/language-identification.html
thanks, I want to find the mode to classify the web page whether it is low quality instead of language identification