mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

ValueError: operands could not be broadcast together with shapes (12582912,1) (3072,8192)

Open bhuvneshsaini opened this issue 11 months ago • 1 comments

slices:

  • sources:
    • model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit layer_range:
      • 0
      • 28
    • model: taareshg/Llama-3.2-3B-Instruct-En-Hi-merge-200k layer_range:
      • 0
      • 28 merge_method: slerp base_model: unsloth/Llama-3.2-3B-Instruct-bnb-4bit parameters: t:
    • filter: self_attn value:
      • 0
      • 0.5
      • 0.3
      • 0.7
      • 1
    • filter: mlp value:
      • 1
      • 0.5
      • 0.7
      • 0.3
      • 0
    • value: 0.5 dtype: bfloat16

error: [2025-01-06 12:05:07] [INFO] Merge configuration saved in /tmp/tmp26k63_vv/merged/config.yaml [2025-01-06 12:05:07] [INFO] Creating repo bhuvneshsaini/unsloth-merge-3.2-3B-Instruct-bnb-4bit [2025-01-06 12:05:07] [INFO] Repo created: https://huggingface.co/bhuvneshsaini/unsloth-merge-3.2-3B-Instruct-bnb-4bit [2025-01-06 12:05:07] [INFO] Running mergekit-yaml config.yaml merge --copy-tokenizer --cuda --low-cpu-memory --allow-crimes --lora-merge-cache /tmp/tmp26k63_vv/.lora_cache [2025-01-06 12:05:10] [INFO] [2025-01-06 12:05:10] [INFO] [2025-01-06 12:05:18] [INFO] Warmup loader cache: 0%| | 0/2 [00:00<?, ?it/s][A [2025-01-06 12:05:18] [INFO] [2025-01-06 12:05:20] [INFO] Warmup loader cache: 50%|█████ | 1/2 [00:07<00:07, 7.34s/it][A [2025-01-06 12:05:20] [INFO] [2025-01-06 12:05:20] [INFO] Warmup loader cache: 100%|██████████| 2/2 [00:10<00:00, 4.62s/it][A [2025-01-06 12:05:20] [INFO] Warmup loader cache: 100%|██████████| 2/2 [00:10<00:00, 5.03s/it] [2025-01-06 12:05:22] [INFO] [2025-01-06 12:05:22] [INFO] [2025-01-06 12:05:29] [INFO] Executing graph: 0%| | 0/1272 [00:00<?, ?it/s][A [2025-01-06 12:05:29] [INFO] [2025-01-06 12:05:30] [INFO] Executing graph: 0%| | 5/1272 [00:07<30:20, 1.44s/it][A [2025-01-06 12:05:30] [INFO] Executing graph: 1%| | 14/1272 [00:07<11:05, 1.89it/s] [2025-01-06 12:05:30] [INFO] Traceback (most recent call last): [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/bin/mergekit-yaml", line 8, in [2025-01-06 12:05:30] [INFO] sys.exit(main()) [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/click/core.py", line 1157, in call [2025-01-06 12:05:30] [INFO] return self.main(*args, **kwargs) [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/click/core.py", line 1078, in main [2025-01-06 12:05:30] [INFO] rv = self.invoke(ctx) [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/click/core.py", line 1434, in invoke [2025-01-06 12:05:30] [INFO] return ctx.invoke(self.callback, **ctx.params) [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/click/core.py", line 783, in invoke [2025-01-06 12:05:30] [INFO] return __callback(*args, **kwargs) [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/mergekit/options.py", line 82, in wrapper [2025-01-06 12:05:30] [INFO] f(*args, **kwargs) [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/mergekit/scripts/run_yaml.py", line 47, in main [2025-01-06 12:05:30] [INFO] run_merge( [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/mergekit/merge.py", line 96, in run_merge [2025-01-06 12:05:30] [INFO] for _task, value in exec.run(quiet=options.quiet): [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/mergekit/graph.py", line 197, in run [2025-01-06 12:05:30] [INFO] res = task.execute(**arguments) [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/mergekit/merge_methods/slerp.py", line 60, in execute [2025-01-06 12:05:30] [INFO] slerp( [2025-01-06 12:05:30] [INFO] File "/home/user/.pyenv/versions/3.10.15/lib/python3.10/site-packages/mergekit/merge_methods/slerp.py", line 137, in slerp [2025-01-06 12:05:30] [INFO] dot = np.sum(v0 * v1) [2025-01-06 12:05:30] [INFO] ValueError: operands could not be broadcast together with shapes (12582912,1) (3072,8192) [2025-01-06 12:05:30] [ERROR] Command exited with code 1 [2025-01-06 12:05:30] [ERROR] Merge failed. Deleting repo as no model is uploaded.

bhuvneshsaini avatar Jan 06 '25 11:01 bhuvneshsaini

You should try using meta-llama/Llama-3.2-3B-Instruct instead of unsloth/Llama-3.2-3B-Instruct-bnb-4bit

ngxson avatar Jan 08 '25 19:01 ngxson