🐛 Bug
Wav2vec2 error after validation when training
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
TASK=audio_pretraining
CRITERION=ctc
MAX_TOKENS=960000
#about 16.6 sentences
#Optimization
TOTAL_UPDATES=80000
UPDATE_FREQ=4 #for 2 ka?
LR=0.00003
#LR Scheduler
WARMUP_UPDATES=8000
HOLD_STEPS=32000
DECAY_STEPS=40000
FINAL_LR_SCALE=0.05
#Model
ARCH=wav2vec_ctc
MASK_PROB=0.65
MASK_CHANNLE_PROB=0.5
MASK_CHANNLE_LEN=64
MASK_LEN=10
FREEZE_FINETUNE_UPDATES=0
SEED=2337
DATA_DIR=/path/checkpoint_wav2vec/data/clean_100h
SAVE_DIR=/path/checkpoint_wav2vec/finetune_libri100h
RESUME_PATH=/path/checkpoint_wav2vec/wav2vec/checkpoint_last.pt
mkdir -p $SAVE_DIR
python $dist_config train.py $DATA_DIR --save-dir $SAVE_DIR --fp16
--post-process letter --valid-subset valid --no-epoch-checkpoints
--best-checkpoint-metric wer --num-workers 4
--max-update ${TOTAL_UPDATES} --sentence-avg
--task ${TASK} --arch ${ARCH}
--w2v-path ${RESUME_PATH}
--labels ltr
--apply-mask --mask-selection static --mask-other 0 --mask-length $MASK_LEN --mask-prob $MASK_PROB
--layerdrop 0.1 --mask-channel-selection static --mask-channel-other 0
--mask-channel-length $MASK_CHANNLE_LEN --mask-channel-prob $MASK_CHANNLE_PROB
--zero-infinity --feature-grad-mult 0.0 --freeze-finetune-updates ${FREEZE_FINETUNE_UPDATES}
--validate-after-updates 10000 --optimizer adam
--adam-betas '(0.9, 0.98)' --adam-eps 1e-08 --lr $LR
--lr-scheduler tri_stage --warmup-steps ${WARMUP_UPDATES} --hold-steps ${HOLD_STEPS}
--skip-invalid-size-inputs-valid-test
--update-freq $UPDATE_FREQ
--decay-steps ${DECAY_STEPS} --final-lr-scale $FINAL_LR_SCALE --final-dropout 0.0
--dropout 0.0 --activation-dropout 0.1
--criterion ${CRITERION}
--distributed-no-spawn
--attention-dropout 0.0 --max-tokens ${MAX_TOKENS}
--seed ${SEED} --log-format json --log-interval 50 --ddp-backend no_c10d | tee -a $SAVE_DIR/log.txt
Environment
- fairseq Version master:
- PyTorch Version 1.7.1
- OS Linux:
- How you installed fairseq (
pip install -e .
, source):
- Python version: 3.8
- CUDA/cuDNN version: 10.2
- GPU models and configuration: 8*V100
- Any other relevant information:
Additional context
2021-07-01 12:23:37 | INFO | fairseq.utils | CUDA enviroments for all 8 workers
2021-07-01 12:23:37 | INFO | fairseq.utils | rank 0: capabilities = 7.0 ; total memory = 15.782 GB ; name = Tesla V100-SXM2-16GB
2021-07-01 12:23:37 | INFO | fairseq.utils | rank 1: capabilities = 7.0 ; total memory = 15.782 GB ; name = Tesla V100-SXM2-16GB
2021-07-01 12:23:37 | INFO | fairseq.utils | rank 2: capabilities = 7.0 ; total memory = 15.782 GB ; name = Tesla V100-SXM2-16GB
2021-07-01 12:23:37 | INFO | fairseq.utils | rank 3: capabilities = 7.0 ; total memory = 15.782 GB ; name = Tesla V100-SXM2-16GB
2021-07-01 12:23:37 | INFO | fairseq.utils | rank 4: capabilities = 7.0 ; total memory = 15.782 GB ; name = Tesla V100-SXM2-16GB
2021-07-01 12:23:37 | INFO | fairseq.utils | rank 5: capabilities = 7.0 ; total memory = 15.782 GB ; name = Tesla V100-SXM2-16GB
2021-07-01 12:23:37 | INFO | fairseq.utils | rank 6: capabilities = 7.0 ; total memory = 15.782 GB ; name = Tesla V100-SXM2-16GB
2021-07-01 12:23:37 | INFO | fairseq.utils | rank 7: capabilities = 7.0 ; total memory = 15.782 GB ; name = Tesla V100-SXM2-16GB
2021-07-01 12:23:37 | INFO | fairseq.utils | CUDA enviroments for all 8 workers
2021-07-01 12:23:37 | INFO | fairseq_cli.train | training on 8 devices (GPUs/TPUs)
2021-07-01 12:23:37 | INFO | fairseq_cli.train | max tokens per device = 1280000 and max sentences per device = None
2021-07-01 12:23:37 | INFO | fairseq.trainer | Preparing to load checkpoint /path/checkpoint_wav2vec/finetune_libri100h/checkpoint_last.pt
2021-07-01 12:23:37 | INFO | fairseq.trainer | No existing checkpoint found /path/checkpoint_wav2vec/finetune_libri100h/checkpoint_last.pt
2021-07-01 12:23:37 | INFO | fairseq.trainer | loading train data for epoch 1
2021-07-01 12:23:37 | INFO | fairseq.data.audio.raw_audio_dataset | loaded 28539, skipped 0 samples
2021-07-01 12:23:38 | INFO | fairseq.optim.adam | using FusedAdam
2021-07-01 12:23:38 | INFO | fairseq.trainer | begin training epoch 1
2021-07-01 12:23:38 | INFO | fairseq_cli.train | Start iterating over samples
2021-07-01 12:23:41 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 64.0
2021-07-01 12:23:42 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0
2021-07-01 12:23:42 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 16.0
2021-07-01 12:24:00 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 8.0
2021-07-01 12:24:12 | INFO | train_inner | {"epoch": 1, "update": 0.257, "loss": "2340.18", "ntokens": "25355.1", "nsentences": "134.2", "nll_loss": "12.386", "wps": "42514.1", "ups": "1.68", "wpb": "25355.1", "bsz": "134.2", "num_updates": "50", "lr": "4.85625e-07", "gnorm": "1645.81", "loss_scale": "8", "train_wall": "33", "gb_free": "11.5", "wall": "35"}
2021-07-01 12:24:15 | INFO | fairseq.trainer | NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 4.0
2021-07-01 12:24:42 | INFO | train_inner | {"epoch": 1, "update": 0.5, "loss": "2146.24", "ntokens": "25324.3", "nsentences": "136.92", "nll_loss": "11.604", "wps": "42276.3", "ups": "1.67", "wpb": "25324.3", "bsz": "136.9", "num_updates": "100", "lr": "6.7125e-07", "gnorm": "2119.39", "loss_scale": "4", "train_wall": "30", "gb_free": "11", "wall": "65"}
2021-07-01 12:25:11 | INFO | train_inner | {"epoch": 1, "update": 0.738, "loss": "1917.67", "ntokens": "25309", "nsentences": "135.44", "nll_loss": "10.262", "wps": "43470.7", "ups": "1.72", "wpb": "25309", "bsz": "135.4", "num_updates": "150", "lr": "8.56875e-07", "gnorm": "2846.55", "loss_scale": "4", "train_wall": "29", "gb_free": "10.7", "wall": "94"}
2021-07-01 12:25:40 | INFO | train_inner | {"epoch": 1, "update": 0.976, "loss": "1447.75", "ntokens": "25281.2", "nsentences": "139.42", "nll_loss": "7.984", "wps": "43491.5", "ups": "1.72", "wpb": "25281.2", "bsz": "139.4", "num_updates": "200", "lr": "1.0425e-06", "gnorm": "2930.65", "loss_scale": "4", "train_wall": "29", "gb_free": "10.9", "wall": "123"}
2021-07-01 12:25:43 | INFO | fairseq.checkpoint_utils | Preparing to save checkpoint for epoch 1 @ 205 updates
2021-07-01 12:25:43 | INFO | fairseq.trainer | Saving checkpoint to /path/checkpoint_wav2vec/finetune_libri100h/checkpoint_last.pt
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f1dbeb788b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f1dbedcaef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f1dbeb63b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f1e0d55b902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: _PyObject_GC_New + 0x419 (0x552d89 in /usr/bin/python)
frame #7: PyTraceBack_Here + 0x1d1 (0x5566b1 in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0x3de8 (0x57c5a8 in /usr/bin/python)
frame #9: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #10: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #11: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #17: /usr/bin/python() [0x4ffa96]
frame #18: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #20: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #22: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #24: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #25: /usr/bin/python() [0x4ffa96]
frame #26: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #27: /usr/bin/python() [0x645e55]
frame #28: /usr/bin/python() [0x65f7f4]
frame #29: + 0x76db (0x7f1e118b56db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #30: clone + 0x3f (0x7f1e11bee88f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f51310c08b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f5131312ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f51310abb7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f517faa3902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: _PyObject_GC_New + 0x419 (0x552d89 in /usr/bin/python)
frame #7: /usr/bin/python() [0x5b54bf]
frame #8: PyObject_GetIter + 0x13 (0x507183 in /usr/bin/python)
frame #9: _PyEval_EvalFrameDefault + 0x14fe (0x579cbe in /usr/bin/python)
frame #10: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #11: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #12: /usr/bin/python() [0x600500]
frame #13: _PyObject_CallMethodIdObjArgs + 0xee (0x600dae in /usr/bin/python)
frame #14: PyImport_ImportModuleLevelObject + 0x382 (0x565002 in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x2afd (0x57b2bd in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #17: /usr/bin/python() [0x600500]
frame #18: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #19: /usr/bin/python() [0x53cb41]
frame #20: /usr/bin/python() [0x5431bc]
frame #21: /usr/bin/python() [0x541d1c]
frame #22: /usr/bin/python() [0x540828]
frame #23: /usr/bin/python() [0x542579]
frame #24: /usr/bin/python() [0x542f79]
frame #25: /usr/bin/python() [0x542fd1]
frame #26: /usr/bin/python() [0x541d1c]
frame #27: /usr/bin/python() [0x543926]
frame #28: /usr/bin/python() [0x64f68b]
frame #29: /usr/bin/python() [0x4fb1ff]
frame #30: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #31: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #32: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #33: /usr/bin/python() [0x4ff9e6]
frame #34: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #35: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #36: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #37: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #38: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #40: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #41: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #42: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #43: /usr/bin/python() [0x4ffa96]
frame #44: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #45: /usr/bin/python() [0x645e55]
frame #46: /usr/bin/python() [0x65f7f4]
frame #47: + 0x76db (0x7f5183dfd6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #48: clone + 0x3f (0x7f518413688f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f1dbeb788b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f1dbedcaef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f1dbeb63b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f1e0d55b902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: _PyObject_GC_New + 0x419 (0x552d89 in /usr/bin/python)
frame #7: /usr/bin/python() [0x5af6df]
frame #8: /usr/bin/python() [0x5b1172]
frame #9: _PyEval_EvalFrameDefault + 0x480 (0x578c40 in /usr/bin/python)
frame #10: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #13: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #14: _PyObject_FastCallDict + 0x4a (0x60261a in /usr/bin/python)
frame #15: /usr/bin/python() [0x5b034b]
frame #16: _PyObject_MakeTpCall + 0x28f (0x5fff6f in /usr/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x5553 (0x57dd13 in /usr/bin/python)
frame #18: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #20: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #21: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #22: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #23: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #25: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #26: /usr/bin/python() [0x4ffa96]
frame #27: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #29: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #31: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #33: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #34: /usr/bin/python() [0x4ffa96]
frame #35: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #36: /usr/bin/python() [0x645e55]
frame #37: /usr/bin/python() [0x65f7f4]
frame #38: + 0x76db (0x7f1e118b56db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #39: clone + 0x3f (0x7f1e11bee88f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f51310c08b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f5131312ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f51310abb7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f517faa3902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: /usr/bin/python() [0x55331c]
frame #7: PyTuple_New + 0xe1 (0x5b44f1 in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0xfd1 (0x579791 in /usr/bin/python)
frame #9: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #10: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #11: /usr/bin/python() [0x5b0529]
frame #12: _PyObject_MakeTpCall + 0x1ed (0x5ffecd in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x5b9e (0x57e35e in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #17: /usr/bin/python() [0x600500]
frame #18: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #19: /usr/bin/python() [0x53cb41]
frame #20: /usr/bin/python() [0x5431bc]
frame #21: /usr/bin/python() [0x541d1c]
frame #22: /usr/bin/python() [0x540828]
frame #23: /usr/bin/python() [0x542579]
frame #24: /usr/bin/python() [0x542f79]
frame #25: /usr/bin/python() [0x541d1c]
frame #26: /usr/bin/python() [0x543926]
frame #27: /usr/bin/python() [0x64f68b]
frame #28: /usr/bin/python() [0x4fb1ff]
frame #29: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #30: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #31: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #32: /usr/bin/python() [0x4ff9e6]
frame #33: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #34: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #35: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #37: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #39: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #40: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #41: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #42: /usr/bin/python() [0x4ffa96]
frame #43: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #44: /usr/bin/python() [0x645e55]
frame #45: /usr/bin/python() [0x65f7f4]
frame #46: + 0x76db (0x7f5183dfd6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #47: clone + 0x3f (0x7f518413688f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f14d243c8b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f14d268eef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f14d2427b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f1520e1f902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: /usr/bin/python() [0x55331c]
frame #7: _PyObject_MakeTpCall + 0x411 (0x6000f1 in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0x5553 (0x57dd13 in /usr/bin/python)
frame #9: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #10: /usr/bin/python() [0x600500]
frame #11: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #12: /usr/bin/python() [0x53cb41]
frame #13: /usr/bin/python() [0x5431bc]
frame #14: /usr/bin/python() [0x542f79]
frame #15: /usr/bin/python() [0x542fd1]
frame #16: /usr/bin/python() [0x541d1c]
frame #17: /usr/bin/python() [0x543926]
frame #18: /usr/bin/python() [0x64f68b]
frame #19: /usr/bin/python() [0x4fb1ff]
frame #20: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #21: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #22: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #23: /usr/bin/python() [0x4ff9e6]
frame #24: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #25: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #26: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #28: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #30: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #32: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #33: /usr/bin/python() [0x4ffa96]
frame #34: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #35: /usr/bin/python() [0x645e55]
frame #36: /usr/bin/python() [0x65f7f4]
frame #37: + 0x76db (0x7f15251796db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #38: clone + 0x3f (0x7f15254b288f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f14d243c8b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f14d268eef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f14d2427b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f1520e1f902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: /usr/bin/python() [0x55331c]
frame #7: PyTuple_New + 0xe1 (0x5b44f1 in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0xfd1 (0x579791 in /usr/bin/python)
frame #9: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #10: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #11: /usr/bin/python() [0x5b0529]
frame #12: _PyObject_MakeTpCall + 0x1ed (0x5ffecd in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x5b9e (0x57e35e in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #17: /usr/bin/python() [0x600500]
frame #18: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #19: /usr/bin/python() [0x53cb41]
frame #20: /usr/bin/python() [0x5431bc]
frame #21: /usr/bin/python() [0x541d1c]
frame #22: /usr/bin/python() [0x540828]
frame #23: /usr/bin/python() [0x542579]
frame #24: /usr/bin/python() [0x542f79]
frame #25: /usr/bin/python() [0x541d1c]
frame #26: /usr/bin/python() [0x543926]
frame #27: /usr/bin/python() [0x64f68b]
frame #28: /usr/bin/python() [0x4fb1ff]
frame #29: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #30: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #31: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #32: /usr/bin/python() [0x4ff9e6]
frame #33: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #34: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #35: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #37: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #39: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #40: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #41: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #42: /usr/bin/python() [0x4ffa96]
frame #43: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #44: /usr/bin/python() [0x645e55]
frame #45: /usr/bin/python() [0x65f7f4]
frame #46: + 0x76db (0x7f15251796db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #47: clone + 0x3f (0x7f15254b288f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f1dbeb788b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f1dbedcaef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f1dbeb63b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f1e0d55b902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: /usr/bin/python() [0x55331c]
frame #7: _PyObject_MakeTpCall + 0x411 (0x6000f1 in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0x5553 (0x57dd13 in /usr/bin/python)
frame #9: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #10: /usr/bin/python() [0x600500]
frame #11: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #12: /usr/bin/python() [0x53cb41]
frame #13: /usr/bin/python() [0x5431bc]
frame #14: /usr/bin/python() [0x543025]
frame #15: /usr/bin/python() [0x541d1c]
frame #16: /usr/bin/python() [0x543926]
frame #17: /usr/bin/python() [0x64f68b]
frame #18: /usr/bin/python() [0x4fb1ff]
frame #19: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #20: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #21: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #22: /usr/bin/python() [0x4ff9e6]
frame #23: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #24: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #25: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #27: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #29: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #31: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #32: /usr/bin/python() [0x4ffa96]
frame #33: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #34: /usr/bin/python() [0x645e55]
frame #35: /usr/bin/python() [0x65f7f4]
frame #36: + 0x76db (0x7f1e118b56db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #37: clone + 0x3f (0x7f1e11bee88f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f615ebe08b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f615ee32ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f615ebcbb7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f61ad5c3902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: /usr/bin/python() [0x55331c]
frame #7: _PyEval_EvalCodeWithName + 0x115e (0x5774ee in /usr/bin/python)
frame #8: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #9: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #10: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #13: /usr/bin/python() [0x4ffa96]
frame #14: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #18: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #20: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #21: /usr/bin/python() [0x4ffa96]
frame #22: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #23: /usr/bin/python() [0x645e55]
frame #24: /usr/bin/python() [0x65f7f4]
frame #25: + 0x76db (0x7f61b191d6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #26: clone + 0x3f (0x7f61b1c5688f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdf073378b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7fdf07589ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fdf07322b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7fdf55d1a902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: _PyObject_GC_New + 0x419 (0x552d89 in /usr/bin/python)
frame #7: PyTraceBack_Here + 0x1d1 (0x5566b1 in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0x3de8 (0x57c5a8 in /usr/bin/python)
frame #9: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #10: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #11: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #13: _PyObject_FastCallDict + 0x4a (0x60261a in /usr/bin/python)
frame #14: /usr/bin/python() [0x5b034b]
frame #15: _PyObject_MakeTpCall + 0x28f (0x5fff6f in /usr/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x5553 (0x57dd13 in /usr/bin/python)
frame #17: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #19: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #20: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #22: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #24: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #25: /usr/bin/python() [0x4ffa96]
frame #26: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #28: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #30: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #32: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #33: /usr/bin/python() [0x4ffa96]
frame #34: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #35: /usr/bin/python() [0x645e55]
frame #36: /usr/bin/python() [0x65f7f4]
frame #37: + 0x76db (0x7fdf5a0746db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #38: clone + 0x3f (0x7fdf5a3ad88f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f615ebe08b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f615ee32ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f615ebcbb7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f61ad5c3902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: /usr/bin/python() [0x55331c]
frame #7: PyTuple_New + 0xe1 (0x5b44f1 in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0xfd1 (0x579791 in /usr/bin/python)
frame #9: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #10: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #11: /usr/bin/python() [0x5b0529]
frame #12: _PyObject_MakeTpCall + 0x1ed (0x5ffecd in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x5b9e (0x57e35e in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #17: /usr/bin/python() [0x600500]
frame #18: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #19: /usr/bin/python() [0x53cb41]
frame #20: /usr/bin/python() [0x5431bc]
frame #21: /usr/bin/python() [0x541d1c]
frame #22: /usr/bin/python() [0x540828]
frame #23: /usr/bin/python() [0x542579]
frame #24: /usr/bin/python() [0x542f79]
frame #25: /usr/bin/python() [0x541d1c]
frame #26: /usr/bin/python() [0x543926]
frame #27: /usr/bin/python() [0x64f68b]
frame #28: /usr/bin/python() [0x4fb1ff]
frame #29: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #30: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #31: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #32: /usr/bin/python() [0x4ff9e6]
frame #33: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #34: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #35: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #37: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #39: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #40: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #41: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #42: /usr/bin/python() [0x4ffa96]
frame #43: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #44: /usr/bin/python() [0x645e55]
frame #45: /usr/bin/python() [0x65f7f4]
frame #46: + 0x76db (0x7f61b191d6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #47: clone + 0x3f (0x7f61b1c5688f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdf073378b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7fdf07589ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fdf07322b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7fdf55d1a902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: PyType_GenericAlloc + 0x4f5 (0x5b64d5 in /usr/bin/python)
frame #7: THPSize_NewFromSizes(int, long const*) + 0x23 (0x7fdf55c1c773 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: THPSize_New(at::Tensor const&) + 0x161 (0x7fdf55c1caa1 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: + 0x29ae98 (0x7fdf559b7e98 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #10: /usr/bin/python() [0x4fcdc2]
frame #11: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #13: /usr/bin/python() [0x600500]
frame #14: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #15: /usr/bin/python() [0x53cb41]
frame #16: /usr/bin/python() [0x5431bc]
frame #17: /usr/bin/python() [0x543025]
frame #18: /usr/bin/python() [0x541d1c]
frame #19: /usr/bin/python() [0x543926]
frame #20: /usr/bin/python() [0x64f68b]
frame #21: /usr/bin/python() [0x4fb1ff]
frame #22: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #23: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #24: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #25: /usr/bin/python() [0x4ff9e6]
frame #26: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #27: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #28: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #30: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #32: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #34: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #35: /usr/bin/python() [0x4ffa96]
frame #36: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #37: /usr/bin/python() [0x645e55]
frame #38: /usr/bin/python() [0x65f7f4]
frame #39: + 0x76db (0x7fdf5a0746db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #40: clone + 0x3f (0x7fdf5a3ad88f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f51310c08b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f5131312ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f51310abb7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f517faa3902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: PyType_GenericAlloc + 0x4f5 (0x5b64d5 in /usr/bin/python)
frame #7: /usr/bin/python() [0x5fad71]
frame #8: /usr/bin/python() [0x5b2df5]
frame #9: PyObject_Call + 0x5d (0x5ffafd in /usr/bin/python)
frame #10: _PyErr_NormalizeException + 0xc5 (0x56a125 in /usr/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x5f52 (0x57e712 in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #14: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #15: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #16: _PyObject_FastCallDict + 0x4a (0x60261a in /usr/bin/python)
frame #17: /usr/bin/python() [0x5b034b]
frame #18: _PyObject_MakeTpCall + 0x28f (0x5fff6f in /usr/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x5553 (0x57dd13 in /usr/bin/python)
frame #20: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #22: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #23: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #25: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #27: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #28: /usr/bin/python() [0x4ffa96]
frame #29: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #31: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #33: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #35: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #36: /usr/bin/python() [0x4ffa96]
frame #37: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #38: /usr/bin/python() [0x645e55]
frame #39: /usr/bin/python() [0x65f7f4]
frame #40: + 0x76db (0x7f5183dfd6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #41: clone + 0x3f (0x7f518413688f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f14d243c8b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f14d268eef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f14d2427b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f1520e1f902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: /usr/bin/python() [0x55331c]
frame #7: _PyEval_EvalCodeWithName + 0x115e (0x5774ee in /usr/bin/python)
frame #8: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #9: _PyObject_FastCallDict + 0x4a (0x60261a in /usr/bin/python)
frame #10: /usr/bin/python() [0x5b034b]
frame #11: _PyObject_MakeTpCall + 0x28f (0x5fff6f in /usr/bin/python)
frame #12: _PyEval_EvalFrameDefault + 0x5553 (0x57dd13 in /usr/bin/python)
frame #13: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #14: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #15: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #17: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #18: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #20: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #21: /usr/bin/python() [0x4ffa96]
frame #22: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #24: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #25: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #26: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #28: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #29: /usr/bin/python() [0x4ffa96]
frame #30: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #31: /usr/bin/python() [0x645e55]
frame #32: /usr/bin/python() [0x65f7f4]
frame #33: + 0x76db (0x7f15251796db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #34: clone + 0x3f (0x7f15254b288f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdf073378b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7fdf07589ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7fdf07322b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7fdf55d1a902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: PyType_GenericAlloc + 0x4f5 (0x5b64d5 in /usr/bin/python)
frame #7: THPSize_NewFromSizes(int, long const*) + 0x23 (0x7fdf55c1c773 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: THPSize_New(at::Tensor const&) + 0x161 (0x7fdf55c1caa1 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: + 0x29ae98 (0x7fdf559b7e98 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #10: /usr/bin/python() [0x4fcdc2]
frame #11: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #13: /usr/bin/python() [0x600500]
frame #14: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #15: /usr/bin/python() [0x53cb41]
frame #16: /usr/bin/python() [0x5431bc]
frame #17: /usr/bin/python() [0x543025]
frame #18: /usr/bin/python() [0x541d1c]
frame #19: /usr/bin/python() [0x543926]
frame #20: /usr/bin/python() [0x64f68b]
frame #21: /usr/bin/python() [0x4fb1ff]
frame #22: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #23: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #24: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #25: /usr/bin/python() [0x4ff9e6]
frame #26: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #27: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #28: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #29: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #30: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #32: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #34: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #35: /usr/bin/python() [0x4ffa96]
frame #36: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #37: /usr/bin/python() [0x645e55]
frame #38: /usr/bin/python() [0x65f7f4]
frame #39: + 0x76db (0x7fdf5a0746db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #40: clone + 0x3f (0x7fdf5a3ad88f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f51c72b28b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f51c7504ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f51c729db7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f5215c95902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5fd9b6 (0x7f5215c959b6 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: /usr/bin/python() [0x5b3a21]
frame #6: PyDict_Clear + 0xef (0x5cfa9f in /usr/bin/python)
frame #7: /usr/bin/python() [0x43566c]
frame #8: /usr/bin/python() [0x4d7cc6]
frame #9: /usr/bin/python() [0x55331c]
frame #10: PyTuple_New + 0xe1 (0x5b44f1 in /usr/bin/python)
frame #11: + 0x299239 (0x7f5215931239 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #12: /usr/bin/python() [0x4fcdc2]
frame #13: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #15: /usr/bin/python() [0x600500]
frame #16: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #17: /usr/bin/python() [0x53cb41]
frame #18: /usr/bin/python() [0x5431bc]
frame #19: /usr/bin/python() [0x543025]
frame #20: /usr/bin/python() [0x541d1c]
frame #21: /usr/bin/python() [0x543926]
frame #22: /usr/bin/python() [0x64f68b]
frame #23: /usr/bin/python() [0x4fb1ff]
frame #24: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #25: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #26: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #27: /usr/bin/python() [0x4ff9e6]
frame #28: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #29: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #30: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #31: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #32: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #34: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #35: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #36: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #37: /usr/bin/python() [0x4ffa96]
frame #38: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #39: /usr/bin/python() [0x645e55]
frame #40: /usr/bin/python() [0x65f7f4]
frame #41: + 0x76db (0x7f5219fef6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #42: clone + 0x3f (0x7f521a32888f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f1dbeb788b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f1dbedcaef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f1dbeb63b7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f1e0d55b902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: /usr/bin/python() [0x55331c]
frame #7: PyStructSequence_New + 0x5a (0x5c415a in /usr/bin/python)
frame #8: /usr/bin/python() [0x51d33d]
frame #9: /usr/bin/python() [0x632ef0]
frame #10: /usr/bin/python() [0x5d1dc3]
frame #11: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x619 (0x578dd9 in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #15: /usr/bin/python() [0x600500]
frame #16: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #17: /usr/bin/python() [0x53cb41]
frame #18: /usr/bin/python() [0x5431bc]
frame #19: /usr/bin/python() [0x541d1c]
frame #20: /usr/bin/python() [0x540828]
frame #21: /usr/bin/python() [0x542579]
frame #22: /usr/bin/python() [0x542fd1]
frame #23: /usr/bin/python() [0x542fd1]
frame #24: /usr/bin/python() [0x541d1c]
frame #25: /usr/bin/python() [0x543926]
frame #26: /usr/bin/python() [0x64f68b]
frame #27: /usr/bin/python() [0x4fb1ff]
frame #28: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #29: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #30: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #31: /usr/bin/python() [0x4ff9e6]
frame #32: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #33: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #34: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #35: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #36: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #37: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #38: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #40: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #41: /usr/bin/python() [0x4ffa96]
frame #42: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #43: /usr/bin/python() [0x645e55]
frame #44: /usr/bin/python() [0x65f7f4]
frame #45: + 0x76db (0x7f1e118b56db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #46: clone + 0x3f (0x7f1e11bee88f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f51c72b28b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f51c7504ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f51c729db7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f5215c95902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x5fd9b6 (0x7f5215c959b6 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: /usr/bin/python() [0x5b3a21]
frame #6: PyDict_Clear + 0xef (0x5cfa9f in /usr/bin/python)
frame #7: /usr/bin/python() [0x43566c]
frame #8: /usr/bin/python() [0x4d7cc6]
frame #9: /usr/bin/python() [0x55331c]
frame #10: PyTuple_New + 0xe1 (0x5b44f1 in /usr/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0xfd1 (0x579791 in /usr/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #13: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #14: /usr/bin/python() [0x5b0529]
frame #15: _PyObject_MakeTpCall + 0x1ed (0x5ffecd in /usr/bin/python)
frame #16: _PyEval_EvalFrameDefault + 0x5b9e (0x57e35e in /usr/bin/python)
frame #17: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #19: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #20: /usr/bin/python() [0x600500]
frame #21: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #22: /usr/bin/python() [0x53cb41]
frame #23: /usr/bin/python() [0x5431bc]
frame #24: /usr/bin/python() [0x541d1c]
frame #25: /usr/bin/python() [0x540828]
frame #26: /usr/bin/python() [0x542579]
frame #27: /usr/bin/python() [0x542f79]
frame #28: /usr/bin/python() [0x541d1c]
frame #29: /usr/bin/python() [0x543926]
frame #30: /usr/bin/python() [0x64f68b]
frame #31: /usr/bin/python() [0x4fb1ff]
frame #32: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #33: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #34: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #35: /usr/bin/python() [0x4ff9e6]
frame #36: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #37: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #38: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #40: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #41: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #42: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #43: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #44: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #45: /usr/bin/python() [0x4ffa96]
frame #46: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #47: /usr/bin/python() [0x645e55]
frame #48: /usr/bin/python() [0x65f7f4]
frame #49: + 0x76db (0x7f5219fef6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #50: clone + 0x3f (0x7f521a32888f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f615ebe08b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f615ee32ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f615ebcbb7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f61ad5c3902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: PyType_GenericAlloc + 0x4f5 (0x5b64d5 in /usr/bin/python)
frame #7: _PyObject_MakeTpCall + 0x170 (0x5ffe50 in /usr/bin/python)
frame #8: _PyEval_EvalFrameDefault + 0x5553 (0x57dd13 in /usr/bin/python)
frame #9: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #10: _PyFunction_Vectorcall + 0x247 (0x602bd7 in /usr/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #14: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #16: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #17: /usr/bin/python() [0x4ffa96]
frame #18: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #20: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #22: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #24: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #25: /usr/bin/python() [0x4ffa96]
frame #26: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #27: /usr/bin/python() [0x645e55]
frame #28: /usr/bin/python() [0x65f7f4]
frame #29: + 0x76db (0x7f61b191d6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #30: clone + 0x3f (0x7f61b1c5688f in /lib/x86_64-linux-gnu/libc.so.6)
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: initialization error
Exception raised from insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:717 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f615ebe08b2 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x1070 (0x7f615ee32ef0 in /home/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: c10::TensorImpl::release_resources() + 0x4d (0x7f615ebcbb7d in /home/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #3: + 0x5fd902 (0x7f61ad5c3902 in /home/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: /usr/bin/python() [0x5b43fa]
frame #5: /usr/bin/python() [0x4d7cc6]
frame #6: _PyObject_GC_New + 0x419 (0x552d89 in /usr/bin/python)
frame #7: /usr/bin/python() [0x5da528]
frame #8: /usr/bin/python() [0x4fb52d]
frame #9: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #10: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #12: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #13: /usr/bin/python() [0x5b9fcd]
frame #14: _PyEval_EvalFrameDefault + 0x146b (0x579c2b in /usr/bin/python)
frame #15: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #16: /usr/bin/python() [0x600500]
frame #17: PyObject_CallFunctionObjArgs + 0x8e (0x6007ee in /usr/bin/python)
frame #18: /usr/bin/python() [0x53cb41]
frame #19: /usr/bin/python() [0x5431bc]
frame #20: /usr/bin/python() [0x541d1c]
frame #21: /usr/bin/python() [0x540828]
frame #22: /usr/bin/python() [0x542579]
frame #23: /usr/bin/python() [0x542f79]
frame #24: /usr/bin/python() [0x541d1c]
frame #25: /usr/bin/python() [0x543926]
frame #26: /usr/bin/python() [0x64f68b]
frame #27: /usr/bin/python() [0x4fb1ff]
frame #28: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #29: _PyEval_EvalCodeWithName + 0x25c (0x5765ec in /usr/bin/python)
frame #30: _PyFunction_Vectorcall + 0x442 (0x602dd2 in /usr/bin/python)
frame #31: /usr/bin/python() [0x4ff9e6]
frame #32: _PyEval_EvalFrameDefault + 0x53f0 (0x57dbb0 in /usr/bin/python)
frame #33: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #34: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #35: _PyEval_EvalFrameDefault + 0x1c4a (0x57a40a in /usr/bin/python)
frame #36: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #37: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #38: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x88d (0x57904d in /usr/bin/python)
frame #40: _PyFunction_Vectorcall + 0x19c (0x602b2c in /usr/bin/python)
frame #41: /usr/bin/python() [0x4ffa96]
frame #42: PyVectorcall_Call + 0x51 (0x5ff3b1 in /usr/bin/python)
frame #43: /usr/bin/python() [0x645e55]
frame #44: /usr/bin/python() [0x65f7f4]
frame #45: + 0x76db (0x7f61b191d6db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #46: clone + 0x3f (0x7f61b1c5688f in /lib/x86_64-linux-gnu/libc.so.6)
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/home/.local/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 508, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 751, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/home/.local/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 508, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 751, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/home/.local/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 508, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 751, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
Exception in thread self._target(*self._args, **self._kwargs)
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
Thread-4:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
return _ForkingPickler.loads(res)
File "/home/.local/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 508, in Client
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
answer_challenge(c, authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 751, in answer_challenge
self._target(*self._args, **self._kwargs)
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 25, in _pin_memory_loop
message = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
return _ForkingPickler.loads(res)
File "/home/.local/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
fd = df.detach()
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
raise EOFError
EOFError
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 508, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 756, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.8/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
2021-07-01 12:25:48 | INFO | fairseq.trainer | Finished saving checkpoint to /path/checkpoint_wav2vec/finetune_libri100h/checkpoint_last.pt
2021-07-01 12:25:48 | INFO | fairseq.checkpoint_utils | Saved checkpoint /path/checkpoint_wav2vec/finetune_libri100h/checkpoint_last.pt (epoch 1 @ 205 updates, score None) (writing took 5.110959745943546 seconds)
2021-07-01 12:25:48 | INFO | fairseq_cli.train | end of epoch 1 (average epoch stats below)
2021-07-01 12:25:48 | INFO | train | {"epoch": 1, "train_loss": "1945.22", "train_ntokens": "25226.1", "train_nsentences": "135.902", "train_nll_loss": "10.48", "train_wps": "41139.2", "train_ups": "1.63", "train_wpb": "25226.1", "train_bsz": "135.9", "train_num_updates": "205", "train_lr": "1.06106e-06", "train_gnorm": "2389.16", "train_loss_scale": "4", "train_train_wall": "123", "train_gb_free": "11.8", "train_wall": "131"}
2021-07-01 12:25:48 | INFO | fairseq.trainer | begin training epoch 2
2021-07-01 12:25:48 | INFO | fairseq_cli.train | Start iterating over samples
Traceback (most recent call last):
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 872, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/usr/lib/python3.8/queue.py", line 178, in get
raise Empty
_queue.Empty
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 14, in
cli_main()
File "/path/asr_pretrain/fairseq-master/fairseq_cli/train.py", line 507, in cli_main
distributed_utils.call_main(cfg, main)
File "/path/asr_pretrain/fairseq-master/fairseq/distributed/utils.py", line 354, in call_main
distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
File "/path/asr_pretrain/fairseq-master/fairseq/distributed/utils.py", line 328, in distributed_main
main(cfg, **kwargs)
File "/path/asr_pretrain/fairseq-master/fairseq_cli/train.py", line 180, in main
valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/path/asr_pretrain/fairseq-master/fairseq_cli/train.py", line 287, in train
for i, samples in enumerate(progress):
File "/path/asr_pretrain/fairseq-master/fairseq/logging/progress_bar.py", line 191, in iter
for i, obj in enumerate(self.iterable, start=self.n):
File "/path/asr_pretrain/fairseq-master/fairseq/data/iterators.py", line 56, in next
x = next(self._itr)
File "/path/asr_pretrain/fairseq-master/fairseq/data/iterators.py", line 509, in _chunk_iterator
for x in itr:
File "/path/asr_pretrain/fairseq-master/fairseq/data/iterators.py", line 56, in next
x = next(self._itr)
File "/path/asr_pretrain/fairseq-master/fairseq/data/iterators.py", line 637, in next
raise item
File "/path/asr_pretrain/fairseq-master/fairseq/data/iterators.py", line 567, in run
for item in self._source:
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
idx, data = self._get_data()
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1024, in _get_data
success, data = self._try_get_data()
File "/home/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 885, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 5281, 5288, 5295) exited unexpectedly