SPT
SPT copied to clipboard
Questions about the sensitivity function
Hello, thanks for providing the code. I have some questions about calculating sensitivity, and I appreciate it if you could clarify them for me.
- What values of
alpha
andbeta
should generally be used? - in your experience, how many batches should be processed for reliable estimation of sensitivity?
- In L181 what do the values denote? Are they the number of total tunable parameters to select?
- Could you explain how the sweep is performed in, and why the value of 80 is chosen in L189?
- can you explain this condition in L282 in your code? When I run the code it only return results with for 1.0, 0.8 and 0.6, and for smaller values the condition does not satisfy apparently.
- In L279, can you explain why param count is calculated in this way? What is the division by 1e6 performed?
- In L191 and L196, why
param_num
is multiplied by 0.02 and 1e6 respectively? - When using LoRA, I assume the additional parameters will be merged into the original params after training is done. Is the code for that available?
Thank you in advance.