Some question on code of LISA
I have known that LISA's core code in src\lmflow\pipeline\finetuner.py, mainly in class DynamicLayerActivationCallback. I read it with Algorithm 1 Layerwise Importance Sampling AdamW (LISA) in paper aside.
So where is step2: Freeze all layers except the embedding and language modeling head layer? I can only find def freeze_all_layers(self) in class DynamicLayerActivationCallback, not excluding embedding and head layer
And i'm curious on the notation k in paper Algorithm 1: step 4: Run AdamW for K iterations with ${η_t}_{t=ik}^{ik=k-1}$ Is the k same as K ?
My english is bad so tell me if any understanding problem,thanks for answering
The LISA paper is just comfused, including the datasets, code, etc. And there is no Imporrance Sampling.