mergekit icon indicating copy to clipboard operation
mergekit copied to clipboard

support for JAISLMHeadModel

Open h9-tect opened this issue 1 year ago • 1 comments

can you please add this arch to enable models like Jais for arabic

h9-tect avatar Jan 10 '24 08:01 h9-tect

Sure! #99 has what I think is a working implementation. I'd appreciate if you could try it out and let me know if it works for you.

Thanks for the interest!

cg123 avatar Jan 12 '24 06:01 cg123

I tried it but got this error

Fetching 12 files: 100% 12/12 [03:28<00:00, 17.34s/it]
  0% 0/291 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/mergekit-yaml", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/content/mergekit/mergekit/options.py", line 58, in wrapper
    f(*args, **kwargs)
  File "/content/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
    run_merge(
  File "/content/mergekit/mergekit/merge.py", line 110, in run_merge
    exec.run(
  File "/content/mergekit/mergekit/graph.py", line 250, in run
    for ref, tensor in tqdm.tqdm(self.generate_tensors(), total=len(self.targets)):
  File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/content/mergekit/mergekit/graph.py", line 269, in generate_tensors
    schedule = self._schedule_ops()
  File "/content/mergekit/mergekit/graph.py", line 365, in _schedule_ops
    dependencies, ops = self._build_dependencies()
  File "/content/mergekit/mergekit/graph.py", line 405, in _build_dependencies
    _visit(target)
  File "/content/mergekit/mergekit/graph.py", line 402, in _visit
    _visit(dependency)
  File "/content/mergekit/mergekit/graph.py", line 394, in _visit
    raise RuntimeError(f"No rule to produce {node}")
RuntimeError: No rule to produce core42/jais-13b:model.embed_tokens.weight

h9-tect avatar Jan 13 '24 05:01 h9-tect

Oh, I saw this #29 I think they are different architecture

h9-tect avatar Jan 13 '24 05:01 h9-tect

That does look like you're trying to merge it with a Llama model. What models are you using?

cg123 avatar Jan 13 '24 06:01 cg123

I'm using AceGPT-13B and jais-13b

h9-tect avatar Jan 13 '24 06:01 h9-tect

Gotcha. Looks like AceGPT is Llama-based. Unfortunately the Jais architecture isn't compatible with Llama - they use very different structures and Jais is actually more like a much-improved descendant of gpt2 than a relative of Llama. Only models fine tuned from Jais as a base will really be compatible in a merge.

Sorry I can't be more helpful here!

cg123 avatar Jan 13 '24 06:01 cg123

thanks you helped me a lot

h9-tect avatar Jan 13 '24 06:01 h9-tect