tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

Adding multiprocessing for sentencepiece_extractor

Open AamodThakur opened this issue 6 months ago • 0 comments

Goal: To speedup merges extraction for the Sentence Piece

Adding multiprocessing for sentencepiece_extractor code to speed-up the merges extraction process. Since for 128K vocabulary more than an hour of time is consumed.

AamodThakur avatar Jun 19 '25 04:06 AamodThakur