mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

Qwen2.5 14B models are ... sometimes? ... having their token vocabulary truncated down to 'actual'?

Open ann-brown opened this issue 4 months ago • 0 comments

Actual example of a merge that produced this issue:

models:
  - model: Qwen/Qwen2.5-14B-Instruct
    parameters:
      weight: 0.3
      density: 0.4
merge_method: della
base_model: <base model path>
parameters:
  epsilon: 0.05
  lambda: 1
dtype: bfloat16
tokenizer_source: base

Additional relevant information is that if I get the tokenizer vocab size with tokenizer_vocab_size = len(tokenizer) from ... any Qwen 2.5 14B model, I get the 151665 number rather than the 152064 number that's in the config.json.

I don't fully understand why it's trimming the vocabulary size and embedding layer down in this merge method but none of the others, but it's annoying for compatibility and specifying the tokenizer_source doesn't seem to address the issue (presumably because the tokenizer doesn't actually have 152064 worth of vocabulary)

ann-brown avatar Sep 27 '24 14:09 ann-brown