MiniCPM icon indicating copy to clipboard operation
MiniCPM copied to clipboard

Update finetune.py

Open PUITAR opened this issue 1 year ago • 0 comments

    def __init__(
        self,
        data_path,
        tokenizer,
        model_max_length=4096,
        user_tokens=[1786, 4194, 95388],
        assistant_tokens=[1786, 10850, 95388],
    ): 

We try to do sft on MiniCPM-1B-sft-bp16, but there is an error, because the token "95388", which means '' (empty string). However there is no token(95388) inside the model embedding. So we delete the 95388. the code we modified:

        user_tokens=[1786, 4194],
        assistant_tokens=[1786, 10850],
image

PUITAR avatar Apr 30 '24 06:04 PUITAR