OpenNMT-py
OpenNMT-py copied to clipboard
onmt_train an illegal memory access was encountered
onmt_train -data demo/data -save_model demo-model -layers 6 -rnn_size 64 -word_vec_size 64 -transformer_ff 256 -heads 8 -encoder_type transformer -decoder_type transformer -position_encoding -train_steps 20000 -max_generator_batches 2 -batch_size 640 -dropout 0.1 -batch_type tokens -normalization tokens -accum_count 2 -optim adam -adam_beta2 0.998 -decay_method noam -warmup_steps 1000 -learning_rate 2 -max_grad_norm 0 -param_init 0 -param_init_glorot -label_smoothing 0.1 -valid_steps 50 -save_checkpoint_steps 500 -world_size 8 -gpu_ranks 0 1 2 3 4 5 6 7
when begin valid. occur
RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /pytorch/aten/src/THC/THCReduceAll.cuh:327
what(): CUDA error: an illegal memory access was encountered (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:771)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7fd98245c536 in /data/common_tool/anaconda3/envs/dnn/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x7ae (0x7fd98269ffbe in /data/common_tool/anaconda3/envs/dnn/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
pytorch1.5 cuda10.2
same here