sentencepiece icon indicating copy to clipboard operation
sentencepiece copied to clipboard

can we train by Parallel Computing or Multithreading or multi-Progress

Open joytianya opened this issue 5 years ago • 6 comments

can we train by Parallel Computing or Multithreading or multi-Progress? Speed up training thank you

joytianya avatar Jul 12 '19 01:07 joytianya

@joytianya Yes, we can! For example, look at YouTokenToMe. This BPE implementation quite efficiently uses parallel processing.

yutkin avatar Jul 22 '19 09:07 yutkin

Thank you. I will take a look. Actually, the current BPE algorithm is a little conservative to find the most frequent pairs.

taku910 avatar Aug 02 '19 04:08 taku910

Will work on it in the next release.

taku910 avatar May 02 '23 15:05 taku910

I am really looking forward to parallel training, as running Asian language corpora on multi-core computers is extremely slow, making me feel like I am wasting my CPU...

lockmatrix avatar Jun 07 '23 11:06 lockmatrix

Will work on it in the next release.

Hi @taku910 - I was wondering whether this feature was released. Thank you

heyaudace avatar Dec 08 '23 04:12 heyaudace