mlx-examples icon indicating copy to clipboard operation
mlx-examples copied to clipboard

Lora: generating from local fused model fails

Open sandeepimpressico opened this issue 1 year ago • 3 comments

Generation works if I give the original model and adapter file but fails with local fused model.

  1. Generation works when I specify the adapter file.

(mlx) % python lora.py --model mistralai/Mistral-7B-v0.1 --adapter adapters.npz --max-tokens 200 --prompt "table: 1-10015132-16 columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team Q: What is terrence ross' nationality A: " Loading pretrained model Fetching 10 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 99156.12it/s] Total parameters 7243.436M Trainable parameters 1.704M Loading datasets Generating table: 1-10015132-16 columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team Q: What is terrence ross' nationality A: 10303002-16 (C) Toronto Raptors guard Terrence Ross is from Kinston, North Carolina. Born in 1990, Ross played college basketball from 2010-12 for Washington State University. He was selected by Toronto in the first round (eighth overall) of the 2012 NBA draft A: 10303002-16 (C) Toronto Raptors guard Terrence Ross is from Kinston, North Carolina. Born in 1990, Ross played college basketball from 2010-12 for Washington State University. He was selected by Toronto in the first round (eighth overall) of the 2012 NBA draft same answers: 10303002-16 (C) Toronto Raptors guard Terrence Ross is from Kinston, North Carolina. Born in ==========**

  1. Create a fused model in finetunefuse directory:

(mlx) % python fuse.py --model mistralai/Mistral-7B-v0.1 --adapter-file adapters.npz --save-path finetunefuse Loading pretrained model Fetching 10 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 35365.13it/s]

  1. Directory contents

(mlx) % ls -l finetunefuse total 28319840 -rw-r--r-- 1 sandeep staff 616 Jan 10 11:21 config.json -rw-r--r-- 1 sandeep staff 414 Jan 10 11:21 special_tokens_map.json -rw-r--r-- 1 sandeep staff 1795303 Jan 10 11:21 tokenizer.json -rw-r--r-- 1 sandeep staff 493443 Jan 10 11:21 tokenizer.model -rw-r--r-- 1 sandeep staff 967 Jan 10 11:21 tokenizer_config.json -rw-r--r-- 1 sandeep staff 14483498189 Jan 10 11:21 weights.00.safetensors

  1. Use the same command as in step 1 but specify local fused model path. Generates garbage.

(mlx) % python lora.py --model finetunefuse --max-tokens 200 --prompt "table: 1-10015132-16 columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team Q: What is terrence ross' nationality A: " Loading pretrained model Total parameters 7243.436M Trainable parameters 1.704M Loading datasets Generating table: 1-10015132-16 columns: Player, No., Nationality, Position, Years in Toronto, School/Club Team Q: What is terrence ross' nationality A: : Business duration operatioion: 1- Table recommendations industrial operatioin percent […] [...]verb: Functional duration operatioin operations ')   […] […] […] […] […] […] […] […] […] […] […] […] […] […] Rice […] […] Rice […] […] […] […] […] […] […] […] […] […] […] […]rea […]rea […] Rea […] […] Rebecca […] […] […] […] VAR […] […] […] […] nationality appro […]aronit […]arios […]AR […]ARgio […]texttt riority […]rypto […] […] Functional NonNull […]ornoitutue AC duration […]crementer […] Currently […] Currently […]  […] ###### […] […] […]amed Original […]onoik Tre tre […]垂 Sea War same […] Shanghai […]ITH Tam […] ##### ado Average […]aturday […]exported […] ## […] […] […] Conserv […] Conservative […] […] […]GIN ippi ==========**

sandeepimpressico avatar Jan 10 '24 17:01 sandeepimpressico

Could you share the training command you used?

awni avatar Jan 10 '24 19:01 awni

python lora.py --model mistralai/Mistral-7B-v0.1 --train

sandeepimpressico avatar Jan 10 '24 21:01 sandeepimpressico

I think the problem here is that our lora.py script assumes you are sending it an "unfused" model so it is loading the adapters into a fused model and hence giving you garbage. You could try passing --lora-layers 0 when you call generate with the already fused model, that should work 🤔

awni avatar Jan 11 '24 00:01 awni

That works.

sandeepimpressico avatar Jan 11 '24 01:01 sandeepimpressico