keras
keras copied to clipboard
enable_lora() API introduces new weights with a different path than other model weights
Here is an excerpt from a model where LoRA was enabled on query and value layers:
decoder_block_0/pre_attention_norm/scale (2048,) PartitionSpec()
decoder_block_0/attention/query/kernel (8, 2048, 256) PartitionSpec(None, 'model')
query/lora_kernel_a (8, 2048, 4) PartitionSpec(None, None, None)
query/lora_kernel_b (4, 256) PartitionSpec()
decoder_block_0/attention/key/kernel (1, 2048, 256) PartitionSpec(None, 'model')
decoder_block_0/attention/value/kernel (1, 2048, 256) PartitionSpec(None, 'model')
value/lora_kernel_a (1, 2048, 4) PartitionSpec(None, None, None)
value/lora_kernel_b (4, 256) PartitionSpec()
decoder_block_0/attention/attention_output/kernel (8, 256, 2048) PartitionSpec(None, None, 'model')
Notice that LoRA weights have a name that does not start with decoder_block_0/attention
as all other weights in this Transformer model.