mlx-examples icon indicating copy to clipboard operation
mlx-examples copied to clipboard

Convert MLX model to PyTorch/Hugging Face

Open fakerybakery opened this issue 1 year ago • 20 comments

Hi, Is it possible to convert a LoRA model trained with MLX back into the HuggingFace format to publish on the HuggingFace hub, and preferably merge it with the main model? Thank you!

fakerybakery avatar Dec 15 '23 23:12 fakerybakery

This would be a nice feature.

tawnkramer avatar Dec 17 '23 18:12 tawnkramer

Count me as interested in this.

justinh-rahb avatar Dec 18 '23 13:12 justinh-rahb

It would be great to convert the model files and adapter to a GGUF file.

USMCM1A1 avatar Dec 28 '23 19:12 USMCM1A1

Yeah. Mlx is super nice, but it is missing the "deploy" part, what do you do after you like your end result and want other people to enjoy it too?

bernaferrari avatar Jan 05 '24 21:01 bernaferrari

Merging is implemented here https://github.com/mzbac/mlx-lora but I didn't find yet how to convert to gguf

l0d0v1c avatar Jan 08 '24 16:01 l0d0v1c

That's not yet supported. We have some on going work for GGUF support, see e.g. https://github.com/ml-explore/mlx/pull/350

awni avatar Jan 08 '24 16:01 awni

question from ignorant person, but why mlx format is different from ggpuf, is there any place I can read that?

bernaferrari avatar Jan 08 '24 16:01 bernaferrari

MLX has multiple "formats" that we save arrays in. The docs are a bit scattered but you can find the save load functions docs, for example ops page.

We currently support the standard numpy format (along with zip and compressed zip) and safetensors. GGUF is in the pipeline.

awni avatar Jan 08 '24 16:01 awni

is there a way to load mlx into web socket? Like lm studio?

I'm curious if I could serve my own model via mlx into other apps.

image

bernaferrari avatar Jan 08 '24 16:01 bernaferrari

Thank you @awni . MLX fine tuning is very good on mistral. A pity we can't get a gguf compatible for llama.cpp. or maye reverse quantisation to HF format?

l0d0v1c avatar Jan 08 '24 18:01 l0d0v1c

If the gguf PR is merged, then MLX -> GGUF -> reverse the GGUF convert.py script to create HF model? The convert.py script seems in llama.cpp seems quite complicated, but looks possible.

fakerybakery avatar Jan 08 '24 19:01 fakerybakery

Succeeded by using fuse.py python fuse.py —model mlx_model —save-path ./fuse —adapter-file adapater.npz then rename weights.00.safetensors to model.safetensors. The convert.py from llama.cpp works fine afterward.

python [convert.py](http://convert.py/) ./fuse
./quantize ./fuse/ggml-model-f16.gguf ./fuse/modelq5.gguf q5_0

l0d0v1c avatar Jan 21 '24 09:01 l0d0v1c

@l0d0v1c : I dropped ".fuse' from the python fuse.py step and reformatted the hyphens and got that work. That second part has nothing to do with MLX, correct? I have to get llama.cpp to do the GGUF conversion after renaming the weights.00.safetensors file?

USMCM1A1 avatar Jan 21 '24 15:01 USMCM1A1

Yes exactly

l0d0v1c avatar Jan 21 '24 15:01 l0d0v1c

Can you outline the steps you took in detail? We can see which ones we can improve on our end. For example we could easily change the naming convention to model.safetensors which might make one step simpler. We could also provide a dequantize option in fuse.py.

awni avatar Jan 21 '24 15:01 awni

@l0d0v1c I'm struggling with this (I'm a linguist with no computer/data science training). I've cloned the llama.cpp repo. If the fused/renamed model was in /Users/williammarcellino/mlx-examples/lora/lora_fused_model_GrKRoman_1640 how would I format a command to convert to gguf? Thanks in advance for any help :)

USMCM1A1 avatar Jan 21 '24 15:01 USMCM1A1

@USMCM1A1 you have to clone llama.cpp repo then "make" is enough on mac. rename weights.00 to model python convert.py thedirectoryofyourmodel It will produce a file "ggml-model-f16.gguf" in the same directory Then you can use ./quantize thedirectoryofyourmodel/ggml-model-f16.gguf Thefinal.gguf q4_0

On my experiments on a mlx finetuned model, q8_0 is necessary instead of q4_0

@awni Changing naming convention is a good idea.Another idea is to allow convert just lora to gguf

l0d0v1c avatar Jan 21 '24 16:01 l0d0v1c

@USMCM1A1 my project if also linguistic (ancient greek). I'm not computer scientist as well but I play with buttons.

l0d0v1c avatar Jan 21 '24 16:01 l0d0v1c

@l0d0v1c Awesome that worked! I have a working gguf_q8 version up and running in LM Studio 😊 Thank you so much.

Also: my ft happens to be on the classical world (Hellenic & Roman).

USMCM1A1 avatar Jan 21 '24 19:01 USMCM1A1

@USMCM1A1 I work on a AI able to deal with Diogenes and Antisthene philosophy. The results are just incredible. Happy you succeeded. I sent you a linkedin invitation to share about our... unusual subject.

l0d0v1c avatar Jan 21 '24 19:01 l0d0v1c