MiniCPM
MiniCPM copied to clipboard
Update finetune.py
def __init__(
self,
data_path,
tokenizer,
model_max_length=4096,
user_tokens=[1786, 4194, 95388],
assistant_tokens=[1786, 10850, 95388],
):
We try to do sft on MiniCPM-1B-sft-bp16, but there is an error, because the token "95388", which means '' (empty string). However there is no token(95388) inside the model embedding. So we delete the 95388. the code we modified:
user_tokens=[1786, 4194],
assistant_tokens=[1786, 10850],