mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

GEmma 3 - pass-merge errors

Open David-AU-github opened this issue 9 months ago • 7 comments

Updated to latest Mergekit and correct transformers with GEmma 3 ; getting following errors ( simple pass-through merge, same model - no other models )

Model: 12b GEmma 3 it (reg, not "pt" / multimodal version)

NOTE: "mergekit7" is newest install , I have several.

mergekit-yaml --copy-tokenizer --allow-crimes --cuda --out-shard-size 5B --lazy-unpickle --clone-tensors f:/mergefiles/Gemma-3-12B-exp40-3.txt E:/Gemma-3-12B-exp40-3

WARNING:mergekit.merge:Unable to set number of layers for module multi_modal_projector in output config - you may need to manually correct it. Traceback (most recent call last): File "F:\mergekit7\mergekit\mergekit\merge.py", line 300, in _model_out_config set_config_value(res, cfg_key, module_layers[module_name]) File "F:\mergekit7\mergekit\mergekit\common.py", line 37, in set_config_value parts = key.split(".") ^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'split' Warmup loader cache: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s] Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in run_code File "C:\Program Files\Python312\Scripts\mergekit-yaml.exe_main.py", line 7, in File "C:\Users\david\AppData\Roaming\Python\Python312\site-packages\click\core.py", line 1161, in call return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\david\AppData\Roaming\Python\Python312\site-packages\click\core.py", line 1082, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "C:\Users\david\AppData\Roaming\Python\Python312\site-packages\click\core.py", line 1443, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\david\AppData\Roaming\Python\Python312\site-packages\click\core.py", line 788, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\mergekit7\mergekit\mergekit\options.py", line 123, in wrapper f(*args, **kwargs) File "F:\mergekit7\mergekit\mergekit\scripts\run_yaml.py", line 30, in main run_merge( File "F:\mergekit7\mergekit\mergekit\merge.py", line 70, in run_merge ).plan_to_disk(out_path=out_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\mergekit7\mergekit\mergekit\plan.py", line 335, in plan_to_disk self._plan() File "F:\mergekit7\mergekit\mergekit\plan.py", line 376, in _plan self.normalize_config() File "F:\mergekit7\mergekit\mergekit\plan.py", line 105, in normalize_config raise RuntimeError( RuntimeError: Model has multiple modules, must use modules: config syntax to work with slices

David-AU-github avatar Mar 14 '25 07:03 David-AU-github

With models like this slicing gets a little bit trickier - since the vision tower and the language model have a different number of layers it doesn't make sense to specify slices in such a way. You can use this new syntax:

merge_method: passthrough
modules:
  text_decoder:
    # frankenmerge the text model
    slices:
      - sources:
          - model: google/gemma-3-12b-it
            layer_range: [0, 24]
      - sources:
          - model: google/gemma-3-12b-it
            layer_range: [8, 48]
  vision_tower:
    # keep the vision tower as is
    models:
      - model: google/gemma-3-12b-it
    # or also frankenmerge it?
    # slices:
    #   - sources:
    #       - model: google/gemma-3-12b-it
    #         layer_range: [0, 16]
    #   - sources:
    #       - model: google/gemma-3-12b-it
    #         layer_range: [8, 27]
  multi_modal_projector:
    # no layers in this module just a single set of weights
    models:
      - model: google/gemma-3-12b-it

cg123 avatar Mar 16 '25 05:03 cg123

It gives the following error:

e-tensors --cuda --trust-remote-code Traceback (most recent call last): File "/home/sicarius/mergekit/env/bin/mergekit-yaml", line 8, in sys.exit(main()) ^^^^^^ File "/home/sicarius/mergekit/env/lib/python3.11/site-packages/click/core.py", line 1161, in call return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sicarius/mergekit/env/lib/python3.11/site-packages/click/core.py", line 1082, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/home/sicarius/mergekit/env/lib/python3.11/site-packages/click/core.py", line 1443, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sicarius/mergekit/env/lib/python3.11/site-packages/click/core.py", line 788, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/sicarius/mergekit/mergekit/options.py", line 123, in wrapper f(*args, **kwargs) File "/home/sicarius/mergekit/mergekit/scripts/run_yaml.py", line 30, in main run_merge( File "/home/sicarius/mergekit/mergekit/merge.py", line 40, in run_merge raise RuntimeError("No output requested") RuntimeError: No output requested

SicariusSicariiStuff avatar Mar 16 '25 07:03 SicariusSicariiStuff

Sorry about that! Turns out I didn't fully update the config sanity checks and it was not allowing some valid configs. This config works on main now.

cg123 avatar Mar 16 '25 08:03 cg123

thank you !!!

David-AU-github avatar Mar 17 '25 01:03 David-AU-github

Quick question: Same format with DARE TIES? other merge types?

David-AU-github avatar Mar 17 '25 06:03 David-AU-github

I tried with other finetuned and it throws >RuntimeError: Tensor language_model.model.layers.47.post_feedforward_layernorm.weight required but not present in model

Anyone have similar experience?

hgftrdw45ud67is8o89 avatar Mar 27 '25 12:03 hgftrdw45ud67is8o89

With models like this slicing gets a little bit trickier - since the vision tower and the language model have a different number of layers it doesn't make sense to specify slices in such a way. You can use this new syntax:

I am attempting a similar merge, but with 27b instead of 12b.

merge_method: passthrough
base_model: '/dataset/models/mergekitstuff/Fallen-Gemma3-27B-v1'
modules:
  text_decoder:
    slices:
      - sources:
        - model: '/dataset/models/mergekitstuff/Fallen-Gemma3-27B-v1'
          layer_range: [0,62]
      - sources:
	- model: '/dataset/models/mergekitstuff/X-Ray_Alpha_27B_Base'
          layer_range: [0,62]
  vision_tower:
    models:
      - model: '/dataset/models/mergekitstuff/X-Ray_Alpha_27B_Base'
  multi_modal_projector:
    models:
      - model: '/dataset/models/mergekitstuff/X-Ray_Alpha_27B_Base'

but this get

  File "/dataset/tool/mergekit/mergekit/merge_methods/passthrough.py", line 27, in execute
    raise RuntimeError("Passthrough merge expects exactly one tensor")
RuntimeError: Passthrough merge expects exactly one tensor

I pulled the hf source repos for both locally, but dont know what else I need to do to get this resolved and there is no documentation on this error I could find. if this is unrelated to the issue I can move to another.

yggdrasil75 avatar Jun 10 '25 16:06 yggdrasil75