Pham Van Ngoan

Results 5 comments of Pham Van Ngoan

Yes, you can. I trained with my own language (~1B words). However you have to adjust batch size from 4096 to ~512 with other configs is set default. The running...

Yes, I use Sentencepiece to build vocab and use dup_factor = 10 is ok.

@peregilk I really run with batch size 512. @stefan-it I run with python module ```python import sentencepiece as spm spm.SentencePieceTrainer.Train('--input= --vocab_size=30000 --model_prefix=prefix_name --pad_id=0 --unk_id=1 --pad_piece= --unk_piece= --bos_id=-1 --eos_id=-1 --control_symbols=[CLS],[SEP],[MASK], --user_defined_symbols=(,),",-,.,–,£,€')...

I see the README on TFHub have not updated yet, these parameters are as same as V1. As the author mentioned on README on this project (github), they use no...

@joongkyunKim @tienthegainz Google drive link: https://drive.google.com/open?id=1F_du_iarTepKKYgXpk_cJNGRb34rlJ5c