DeepSpeed DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators

DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators

Open nelyahu opened this issue 1 year ago • 0 comments

The approach till today use the practice where the torch.nn.parameter data is being replaced with a new cpu data storage, to offload device memory. All params are being flatenned on the host and moved to the device. in some accelerators torch.nn.parameter which is a device parameter cannot be assigned with a cpu storage. This PR copy the param data into a new cpu tensor, and shrinks the device storage. Later when the flat buffer is moved to the device param.data will be a view to the flat buffer.

Dec 18 '23 08:12 nelyahu

DeepSpeed DeepSpeed copied to clipboard

DeepSpeedZeroOptimizer: refactor bit16 flattening to support more accelerators

DeepSpeed
DeepSpeed copied to clipboard