qlora
qlora copied to clipboard
Bug Fix: 443 Bytes `adapter_model.bin` files
Aims to fix #38 and #41
Currently, we get extremely small adapter files on checkpoint.
This seems to be due to some issue in the PEFT library.
One of the working solution is to return to an older version which is not possible (since it doesn't contain QLoRA changes)
The following solution works on my setup with 4080 card as well as on Colab notebook.
It has been borrowed from alpaca-lora
I haven't yet tested the output of the adapters trained after this. There seems to be a debate on this issue in the linked alpaca-lora
PR.
@artidoro do let us know if this is the right approach or folks should wait for fix on the peft end
Thank you for your contribution! Your code has been merged into the main branch.
Your code style is very consistent and easy to read, thanks for that!
Did a bit of verification on adapter_model.bin
file saved using this fix and it does seem to contain only lora
layers.
>>> import torch
>>> state_dict = torch.load("output_redpajama3B_test_2/checkpoint-10/adapter_model/adapter_model.bin")
>>> state_dict.keys()
dict_keys(['base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.0.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.0.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.0.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.1.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.1.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.1.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.1.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.2.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.2.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.2.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.2.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.3.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.3.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.3.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.3.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.4.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.4.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.4.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.4.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.5.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.5.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.5.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.5.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.6.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.6.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.6.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.6.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.7.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.7.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.7.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.7.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.8.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.8.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.8.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.8.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.9.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.9.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.9.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.9.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.10.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.10.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.10.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.10.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.11.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.11.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.11.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.11.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.12.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.12.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.12.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.12.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.13.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.13.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.13.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.13.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.14.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.14.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.14.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.14.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.15.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.15.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.15.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.15.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.16.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.16.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.16.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.16.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.17.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.17.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.17.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.17.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.18.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.18.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.18.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.18.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.19.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.19.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.19.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.19.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.20.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.20.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.20.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.20.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.21.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.21.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.21.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.21.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.22.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.22.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.22.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.22.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.23.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.23.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.23.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.23.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.24.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.24.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.24.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.24.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.25.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.25.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.25.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.25.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.26.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.26.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.26.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.26.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.27.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.27.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.27.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.27.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.28.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.28.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.28.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.28.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.29.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.29.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.29.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.29.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.30.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.30.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.30.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.30.mlp.dense_4h_to_h.lora_B.weight', 'base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_A.weight', 'base_model.model.gpt_neox.layers.31.attention.query_key_value.lora_B.weight', 'base_model.model.gpt_neox.layers.31.attention.dense.lora_A.weight', 'base_model.model.gpt_neox.layers.31.attention.dense.lora_B.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_h_to_4h.lora_A.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_h_to_4h.lora_B.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_4h_to_h.lora_A.weight', 'base_model.model.gpt_neox.layers.31.mlp.dense_4h_to_h.lora_B.weight'])
I also checked out the peft
repo and it seems like get_peft_model_state_dict
is now directly part of save_pretrained
and hence not needed in this code https://github.com/huggingface/peft/blame/3714aa2fff158fdfa637b2b65952580801d890b2/src/peft/peft_model.py#L125
Thank you @KKcorps! I also just replicated your fix and it seems to properly store the adapter checkpoints.