Transformer-SSL icon indicating copy to clipboard operation
Transformer-SSL copied to clipboard

dataloader error

Open niutransWZY opened this issue 3 years ago • 0 comments

When I used moby_main for training, Linux memory grew until it crashed. What is the reason and how to solve it

The error is: Traceback (most recent call last): File "moby_main.py", line 236, in main(config) File "moby_main.py", line 121, in main train_one_epoch(config, model, data_loader_train, optimizer, epoch, lr_scheduler) File "moby_main.py", line 151, in train_one_epoch scaled_loss.backward() File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag File "/root/anaconda3/envs/transformer-ssl/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 2605) is killed by signal: Killed.

niutransWZY avatar Sep 23 '21 07:09 niutransWZY