Olatunji Ruwase
Olatunji Ruwase
> File "/home/xxxxxxx/lib/python3.12/site-packages/deepspeed/accelerator/cuda_accelerator.py", line 293, in pin_memory > return tensor.pin_memory() @rpgmaker, can you try changing the above code to `return tensor.cpu().pin_memory()`
That is very strange. Can you try printing `tensor.dtype`, `tensor.shape` and `tensor.device` before the `pin_memory()` call?
> tensor.dtype: torch.float32 > tensor.shape: torch.Size([1776255488]) > tensor.device: cpu @rpgmaker, thanks. Those are expected results and does not explain the error. I don't have a 3090 TI FE to try...
@rpgmaker, that is interesting. Maybe the tensor size is the cause, since we already saw that 1024 size worked. Can you test a sweep of tensor sizes starting from 1024,...
> "offload_optimizer": { > "device": "cpu", > "pin_memory": False One workaround is to control this particular pinning by `pin_memory` in the ds_config. So to unblock you, please try changing the...
> > if self.cpu_offload: > > weights_partition = get_accelerator().pin_memory(weights_partition) > > Using this option actually cause my vscode to crash. That is odd. What happens if you comment out the...
> Here is the code snippet for the test > > ``` > import torch; > from deepspeed.accelerator import get_accelerator; > > start = 1024 > increase = 1 >...
> torch.empty(start * increase, device='cpu') > RuntimeError: [enforce fail at alloc_cpu.cpp:119] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 141011357696 bytes. Error code 12 (Cannot allocate memory)...
> File "/home/xxxxxxx/test_tensor.py", line 7, in > torch.empty(start * increase, device='cpu') Did you remove the `pin_memory()` in your test?
@hongshanli23, this is really great. Do you mind creating a PR in the following location? https://github.com/microsoft/DeepSpeedExamples/tree/master/training/universal_checkpoint Thanks