Results 14 comments of Rpgmaker

@tjruwase I run the memory test to see how much i can allocate and it failed here ``` tensor.dtype: torch.float32 tensor.shape: torch.Size([268493824]) tensor.device: cpu Traceback (most recent call last): File...

Here is the code snippet for the test ``` import torch; from deepspeed.accelerator import get_accelerator; start = 1024 increase = 1 while True: get_accelerator().pin_memory(torch.empty(start * increase, device='cpu')) increase += 100...

Switching the code to just torch.empty produces the following result: ``` Traceback (most recent call last): File "/home/xxxxxxx/test_tensor.py", line 7, in torch.empty(start * increase, device='cpu') RuntimeError: [enforce fail at alloc_cpu.cpp:119]...

@tjruwase with regards to commenting out the code. it started printing the epochs and was training but then crashed at the end of it. when trying to run the test...