corpuscrawler
corpuscrawler copied to clipboard
harfbuzz-testing-wikipedia
Hi Sascha,
Nice work! Here's the output of what roozbeh did for HarfBuzz testing by extracting Wikipedia: https://github.com/behdad/harfbuzz-testing-wikipedia
Don't know if it's of much use. That one included all talk pages of Wikipedia as well, so the word distribution is skewed, for example the word for "User" is over-represented. Anyway, thought I share here for the record.
See also #78