Some question on code of LISA

Open milong26 opened this issue 1 year ago • 1 comments

I have known that LISA's core code in src\lmflow\pipeline\finetuner.py, mainly in class DynamicLayerActivationCallback. I read it with Algorithm 1 Layerwise Importance Sampling AdamW (LISA) in paper aside.

So where is step2: Freeze all layers except the embedding and language modeling head layer? I can only find def freeze_all_layers(self) in class DynamicLayerActivationCallback, not excluding embedding and head layer

And i'm curious on the notation k in paper Algorithm 1: step 4: Run AdamW for K iterations with ${η_t}_{t=ik}^{ik=k-1}$ Is the k same as K ?

My english is bad so tell me if any understanding problem,thanks for answering

Sep 19 '24 07:09 milong26

The LISA paper is just comfused, including the datasets, code, etc. And there is no Imporrance Sampling.

Jul 29 '25 11:07 ZhaoYunfei94