composable-sft
composable-sft copied to clipboard
Memory safe and efficient `unfreeze_k_most_changed_params` calculation implemented.
@AlanAnsell @parovicm Thanks for maintaining this wonderful code base. While using this repository around, I found some memory-inefficient operation that may slow the calculation and even crashes the process for the large models. I verified that this PR can handle this without any effect on other codes.
Using np.partition
is efficient if the sequence of numbers is stored in CPU memory (RAM).
However, the model parameters are already stored inside of GPU, using .tolist()
or .numpy()
on the tensor will call the tensors allocated in the GPU to CPU memory.
If the model size is feasible enough to handle the difference in the CPU memory, this operation will be safe.
When the model size gets larger as much as 33B, this operation can not be run even with the RAM size 1.2T.
For this reason, proposing an operation that does not call the GPU tensor to the CPU.
This additionally reduces the operation time of the unfreeze_k_most_changed_params
, as moving the tensor from GPU to CPU requires an I/O bound that is the most inefficient operation.
For 7B models, the np.partition
operation takes around 4~5 mins to calculate the diffs
while the proposed method leveraging the GPU tensor takes less than a minute.