keras icon indicating copy to clipboard operation
keras copied to clipboard

enable_lora() API introduces new weights with a different path than other model weights

Open martin-gorner opened this issue 1 year ago • 0 comments

Here is an excerpt from a model where LoRA was enabled on query and value layers:

decoder_block_0/pre_attention_norm/scale                    (2048,)           PartitionSpec()
decoder_block_0/attention/query/kernel                      (8, 2048, 256)    PartitionSpec(None, 'model')
query/lora_kernel_a                                         (8, 2048, 4)      PartitionSpec(None, None, None)
query/lora_kernel_b                                         (4, 256)          PartitionSpec()
decoder_block_0/attention/key/kernel                        (1, 2048, 256)    PartitionSpec(None, 'model')
decoder_block_0/attention/value/kernel                      (1, 2048, 256)    PartitionSpec(None, 'model')
value/lora_kernel_a                                         (1, 2048, 4)      PartitionSpec(None, None, None)
value/lora_kernel_b                                         (4, 256)          PartitionSpec()
decoder_block_0/attention/attention_output/kernel           (8, 256, 2048)    PartitionSpec(None, None, 'model')

Notice that LoRA weights have a name that does not start with decoder_block_0/attention as all other weights in this Transformer model.

martin-gorner avatar Feb 09 '24 17:02 martin-gorner