llama
llama copied to clipboard
How to finetune llama checkpoint using metaseq?
I want to finetune the 7B llama checkpoint using metaseq. It seems the llama checkpoints are the consolidated versions of the model. It's not clear how to finetune consolidated model directly in metaseq. Is there a conversion utility to convert consolidated version to metaseq training compatible format?
Llama checkpoint dict keys (consolidated): dict_keys(['tok_embeddings.weight', 'norm.weight', 'output.weight', 'layers.0.attention.wq.weight', 'layers.0.attention.wk.weight', 'layers.0.attention.wv.weight', 'layers.0.attention.wo.weight', ....
OPT checkpoint dict keys (metaseq training compatible format): dict_keys(['model', 'args', 'cfg', 'criterion', 'optimizer_history', 'task_state', 'extra_state', 'shard_metadata']) Zooming into "model": dict_keys(['flat_param_0', 'decoder.layers.0.flat_param_0', 'decoder.layers.1.flat_param_0', 'decoder.layers.2.flat_param_0', ...
Thanks.