Results 5 comments of cyk

Hi, I tried to resume deepspeed multi-node training (using ZeRO) without having shared filesystem. The optimizer states have saved separately in multi-nodes. It requires manual gathering from all of the...

Hi @mariosasko, thank you for your reply. I tried pre-concatenating different datasets into one, but one key need is to keep each batch the same data type. Considering that the...

Hi @Narsil , thank you for the prompt reply. Yeah the extra space is induced by the `"▁"` in `"▁word1"`. Thank you for providing the solution for decoding purpose with...

> Exactly. So if you're training /retraining you will force your model to learn about this, but it does imply a shift in model weights. Yes, simply changing the tokenizing...

> The trick is , special_tokens are treated AHEAD of the rest of the string, so "something" actually gets interpreted as "[some", "thing"] before it even reaches the pre_tokenizer so...