mergekit
mergekit copied to clipboard
Qwen2.5 14B models are ... sometimes? ... having their token vocabulary truncated down to 'actual'?
Actual example of a merge that produced this issue:
models:
- model: Qwen/Qwen2.5-14B-Instruct
parameters:
weight: 0.3
density: 0.4
merge_method: della
base_model: <base model path>
parameters:
epsilon: 0.05
lambda: 1
dtype: bfloat16
tokenizer_source: base
Additional relevant information is that if I get the tokenizer vocab size with tokenizer_vocab_size = len(tokenizer)
from ... any Qwen 2.5 14B model, I get the 151665
number rather than the 152064
number that's in the config.json.
I don't fully understand why it's trimming the vocabulary size and embedding layer down in this merge method but none of the others, but it's annoying for compatibility and specifying the tokenizer_source doesn't seem to address the issue (presumably because the tokenizer doesn't actually have 152064 worth of vocabulary)