mmsegmentation
mmsegmentation copied to clipboard
MemoryError message during the validation phase of training
I'm running the UperNet-Swin model with a custom binary dataset.
The first training steps proceed well and then when the validation step kicks in this is the error message that I receive:
File "C:\ProgramData\Miniconda3\envs\mmsegment\lib\site-packages\torch\serialization.py", line 379, in save _save(obj, opened_zipfile, pickle_module, pickle_protocol) File "C:\ProgramData\Miniconda3\envs\mmsegment\lib\site-packages\torch\serialization.py", line 604, in _save zip_file.write_record(name, storage.data_ptr(), num_bytes) MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "C:\ProgramData\Miniconda3\envs\mmsegment\lib\site-packages\torch\serialization.py", line 380, in save return File "C:\ProgramData\Miniconda3\envs\mmsegment\lib\site-packages\torch\serialization.py", line 259, in exit self.file_like.write_end_of_file() RuntimeError: [enforce fail at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\caffe2\serialize\inline_container.cc:319] . unexpected pos 1725150400 vs 1725150288
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\mmsegmentation\custom_dataset.py", line 130, in
I used the standard setup, just with changing the parameters and registering my binary dataset.
- What dataset did you use?
Environment
sys.platform: win32 Python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:51:29) [MSC v.1929 64 bit (AMD64)] CUDA available: True GPU 0,1: NVIDIA GeForce RTX 3090 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6 NVCC: Cuda compilation tools, release 11.6, V11.6.124 MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31332 for x64 GCC: n/a PyTorch: 1.12.1+cu116 PyTorch compiling details: PyTorch built with:
- C++ Version: 199711
- MSVC 192829337
- Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
- OpenMP 2019
- LAPACK is enabled (usually provided by MKL)
- CPU capability usage: AVX2
- CUDA Runtime 11.6
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.3.2 (built against CUDA 11.5)
- Magma 2.5.4
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/builder/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.13.1+cu116 OpenCV: 4.6.0 MMCV: 1.6.1 MMCV Compiler: MSVC 192930140 MMCV CUDA Compiler: 11.6 MMSegmentation: 0.26.0+13d4c39