mergekit
mergekit copied to clipboard
support for JAISLMHeadModel
can you please add this arch to enable models like Jais for arabic
Sure! #99 has what I think is a working implementation. I'd appreciate if you could try it out and let me know if it works for you.
Thanks for the interest!
I tried it but got this error
Fetching 12 files: 100% 12/12 [03:28<00:00, 17.34s/it]
0% 0/291 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/mergekit-yaml", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/content/mergekit/mergekit/options.py", line 58, in wrapper
f(*args, **kwargs)
File "/content/mergekit/mergekit/scripts/run_yaml.py", line 47, in main
run_merge(
File "/content/mergekit/mergekit/merge.py", line 110, in run_merge
exec.run(
File "/content/mergekit/mergekit/graph.py", line 250, in run
for ref, tensor in tqdm.tqdm(self.generate_tensors(), total=len(self.targets)):
File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1182, in __iter__
for obj in iterable:
File "/content/mergekit/mergekit/graph.py", line 269, in generate_tensors
schedule = self._schedule_ops()
File "/content/mergekit/mergekit/graph.py", line 365, in _schedule_ops
dependencies, ops = self._build_dependencies()
File "/content/mergekit/mergekit/graph.py", line 405, in _build_dependencies
_visit(target)
File "/content/mergekit/mergekit/graph.py", line 402, in _visit
_visit(dependency)
File "/content/mergekit/mergekit/graph.py", line 394, in _visit
raise RuntimeError(f"No rule to produce {node}")
RuntimeError: No rule to produce core42/jais-13b:model.embed_tokens.weight
Oh, I saw this #29 I think they are different architecture
That does look like you're trying to merge it with a Llama model. What models are you using?
I'm using AceGPT-13B and jais-13b
Gotcha. Looks like AceGPT is Llama-based. Unfortunately the Jais architecture isn't compatible with Llama - they use very different structures and Jais is actually more like a much-improved descendant of gpt2 than a relative of Llama. Only models fine tuned from Jais as a base will really be compatible in a merge.
Sorry I can't be more helpful here!
thanks you helped me a lot