unigram-tokenization topic
List
unigram-tokenization repositories
count-tokens-hf-datasets
22
Stars
1
Forks
Watchers
This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Dataflow.