LoRA
LoRA copied to clipboard
Is it expected for the training time to not decrease?
I'm trying to switch all my embedding and Linear layers by Lora layers. Although the GPU size needed reduces, the training time remains the same, even with less trainable weights. Is it expected?
From what I understood, in the GPT-2 experiment, you only changed a single Conv1D layer, right? That makes more sense in terms of training speeds.
Hi Joao,
You should see a speedup if you have previously saturated your GPU utilization.
Yes, for GPT-2 we only changed one layer and marked the rest as not trainable.