composable-sft icon indicating copy to clipboard operation
composable-sft copied to clipboard

Memory safe and efficient `unfreeze_k_most_changed_params` calculation implemented.

Open ArtemisDicoTiar opened this issue 2 months ago • 0 comments

@AlanAnsell @parovicm Thanks for maintaining this wonderful code base. While using this repository around, I found some memory-inefficient operation that may slow the calculation and even crashes the process for the large models. I verified that this PR can handle this without any effect on other codes.

Using np.partition is efficient if the sequence of numbers is stored in CPU memory (RAM). However, the model parameters are already stored inside of GPU, using .tolist() or .numpy() on the tensor will call the tensors allocated in the GPU to CPU memory. If the model size is feasible enough to handle the difference in the CPU memory, this operation will be safe. When the model size gets larger as much as 33B, this operation can not be run even with the RAM size 1.2T. For this reason, proposing an operation that does not call the GPU tensor to the CPU. This additionally reduces the operation time of the unfreeze_k_most_changed_params, as moving the tensor from GPU to CPU requires an I/O bound that is the most inefficient operation. For 7B models, the np.partition operation takes around 4~5 mins to calculate the diffs while the proposed method leveraging the GPU tensor takes less than a minute.

ArtemisDicoTiar avatar Dec 08 '24 08:12 ArtemisDicoTiar