LoRA Is it expected for the training time to not decrease?

Is it expected for the training time to not decrease?

Open JoaoLages opened this issue 1 year ago • 2 comments

I'm trying to switch all my embedding and Linear layers by Lora layers. Although the GPU size needed reduces, the training time remains the same, even with less trainable weights. Is it expected?

Jul 21 '22 11:07 JoaoLages

From what I understood, in the GPT-2 experiment, you only changed a single Conv1D layer, right? That makes more sense in terms of training speeds.

Jul 21 '22 15:07 JoaoLages

Hi Joao,

You should see a speedup if you have previously saturated your GPU utilization.

Yes, for GPT-2 we only changed one layer and marked the rest as not trainable.

Aug 30 '22 18:08 edwardjhu

LoRA LoRA copied to clipboard

Is it expected for the training time to not decrease?

LoRA
LoRA copied to clipboard