torchchat
torchchat copied to clipboard
[Distributed] Support loading from single checkpoint binary
🚀 The feature, motivation and pitch
This is for aligning distributed's load behavior with single-device's case.
Today distributed relies on an index file containing a param->bin
mapping to limit the number of bins that need to be opened. However, not all checkpoint styles come with an index file.
To avoid all processing opening a large bin and OOM CPU, we can use torch.load(mmap=True)
. Though different processes would create their virtual memory space mapped to the file, the OS would only load 1 copy of defaulted pages into the physical memory, and the result can be shared between processes and moved to corresponding device memory.
Alternatives
No response
Additional context
cc: @lessw2020 @mikaylagawarecki