Pytorch_fine_tuning_Tutorial
Pytorch_fine_tuning_Tutorial copied to clipboard
ImportError: DLL load failed: The paging file is too small for this operation to complete.
after run the main_fine_tuning.py file, i got this trace back:
Epoch 0/99
LR is set to 0.001
Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File "main_fine_tuning.py", line 265, in <module>
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
num_epochs=100)
File "main_fine_tuning.py", line 162, in train_model
for data in dset_loaders[phase]:
File "C:\Users\dk12a7\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 501, in __iter__
return _DataLoaderIter(self)
File "C:\Users\dk12a7\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 289, in __init__
w.start()
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
exitcode = _main(fd)
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Users\dk12a7\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\dk12a7\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\dk12a7\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\dk12a7\Desktop\code classification\Pytorch_fine_tuning_Tutorial\main_fine_tuning.py", line 4, in <module>
import torch
File "C:\Users\dk12a7\Anaconda3\lib\site-packages\torch\__init__.py", line 80, in <module>
from torch._C import *
ImportError: DLL load failed: The paging file is too small for this operation to complete.
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\dk12a7\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
i tried to set the BATCH_SIZE =1 , but this problem still occur. Do you have any solution for this one?
I ran into the same problem, have you found a solution?
@brianFruit still stuck in this one.
I've also encountered that problem and it seems that this is a multiprocessing problem. What worked for me was reducing the number of workers in DataLoader (line 108 in your code). Your number is quite high - 25. Workers are subprocesses that load the data, so if you have 25 of them your cpu can rebell :) Try reducing it to 1, and if that works you can try to increase it. If I'm resoning correctly it shouldn't exceed number of your logical processors in CPU (but if you are comupting something parallely, like me rigth now, with another dataloader, you should decrease it even more).
Hope that help future generations
Hi there, I find the same problem with my setups (both in Windows). Originally had a X99 with a 8 core CPU with 64GB of RAM and 2x RTX2080ti and was able to run up to 6x pytorch RL algorithms with up to 10 multiprocessing workers each (total 60 workers running in parallel - obviously they were taking turns). If I pushed passed those numbers, I would get those errors as described above. Now, I changed my setup to be a 3970X with 32 cores 64GB Ram and the same 2x GPUs. I can barely run 3x of the same algos with up to 8 workers each. Any loading more than that generates the same error. When running them the RAM used never more than 40-50%. Any pointing in the right direction will be highly appreciated. Thanks!
I think I managed to solve it (so far). Steps were: 1)- Windows + pause key 2)- Advanced system settings 3)- Advanced tab 4)- Performance - Settings button 5)- Advanced tab - Change button 6)- Uncheck the "Automatically... BLA BLA" checkbox 7)- Select the System managed size option box. 8)- OK, OK, OK..... Restart PC. BOOM
Not sure if it's the best way to solve the problem but it worked so far (fingers crossed)
@Javierete This solution is working for me - thanks! I noticed the error return for me when free space dipped below 7-8 GB for the application I'm running.
Hi Woodrow73, If it's of any value, I ended up setting the values into manual and some ridiculous amount of 360GB as the minimum and 512GB for the maximum. I also added an extra SSD and allocated all of it to Virtual memory. This solved the problem and now I can run up to 128 processes using pytorch and CUDA. I did find out that every launch of Python and pytorch, loads some ridiculous amount of memory to the RAM and then when not used often goes into the virtual memory. Anyway, just sharing my learnings.
I ran this on my PC and encountered the issue which seems like it should be the minimal in terms of memory usage.
import tensorflow as tf
print(tf.version)
I just closed several applications and the problem went away so truly seems like resource issue.
Can someone please assist me on this error, I am kinda new to this so please help me out. I have attached the complete error message.
I have managed to mitigate (although not completely solve) this issue. I posted a more detailed explanation at the StackOverflow link but basically try this:
Download: https://gist.github.com/cobryan05/7d1fe28dd370e110a372c4d268dcb2e5
Install dependency:
python -m pip install pefile
Run (for OPs paths) (NOTE: THIS WILL MODIFY YOUR DLLS [although it will back them up]):
python fixNvPe.py --input C:\Users\dk12a7\Anaconda3\lib\site-packages\torch\lib\*.dll
6)- Uncheck the "Automatically... BLA BLA" checkbox
Hello, Thanks for the solution, but doesnt seem to work now. I got hp Pavilion 15-EC2150AX laptop and the settings specified doesnt appear at my side. Any sort of help will be highly appreciated.
Thanks
Hello, Thanks for the solution, but doesnt seem to work now. I got hp Pavilion 15-EC2150AX laptop and the settings specified doesnt appear at my side. Any sort of help will be highly appreciated.
The setting name is "Automatically Manage Paging File Size For All Drives" and is at the top of the "Virtual Memory" page after clicking the 'change' button.
However, instead of making this change you should first try my fix in the comment immediately before yours, and only apply paging file size fixes if still necessary
For a description of what my fix does, see here: https://stackoverflow.com/a/69489193/213316 For a comparison of my fix against other fixes, see here: https://github.com/ultralytics/yolov3/issues/1643#issuecomment-985652432