llm.c icon indicating copy to clipboard operation
llm.c copied to clipboard

Async optimizer state and model checkpointing

Open chinthysl opened this issue 1 year ago • 0 comments

Additional feature to checkpoint optimizer state and model parameters using a non blocking background thread. Memcopy device buffers to pined host buffer in one shot and let the background thread do I/O operations.

In my 8xA100 setup rough latency improvement is 5.4 sec to 2.3 sec ~ 2X improvement. When it comes to the larger model sizes this feature will save a lot of time.

chinthysl avatar Jun 27 '24 10:06 chinthysl