pandarallel
pandarallel copied to clipboard
OSError: [Errno 28] No space left on device
Hi,
Hi, How can I prevent the space error bellow ?
Traceback (most recent call last):
File "/anaconda/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/anaconda/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/anaconda/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 64, in global_worker
return _func(x)
File "/anaconda/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 120, in wrapper
pickle.dump(result, file)
OSError: [Errno 28] No space left on device
Thanks ! Boris
Hi,
I guess this issue is unrelated to pandarallel.
It seems your hard drive is simply full.
So you may remove some files from you drive...
It happens only when I am using pandarallel on relatively big data frame .
This config does work : pandarallel.initialize(use_memory_fs = False )
Found some related answer here and here
How can I set the temp folder to : /tmp when using pandarallel ?
Same issue with a large dataframe, likely seems to be occurring because /dev/shm is full.
If working with Docker, the default for /dev/shm is only 64M.
This is the solution:
import os
os.environ['JOBLIB_TEMP_FOLDER'] = '/tmp'
pandarallel.initialize(nb_workers=30,progress_bar=True,verbose=2,use_memory_fs = False )
In my case I also had to clean up the /tmp folder.
T
This is the solution:
import os
os.environ['JOBLIB_TEMP_FOLDER'] = '/tmp'pandarallel.initialize(nb_workers=30,progress_bar=True,verbose=2,use_memory_fs = False )
Thank you very much !
In my case I also had to clean up the /tmp folder.
Thanks ! combined the both solutions .
Like this :
import os,shutil
#clean up the /tmp folder
if os.path.isdir("/tmp") :
os.system('rm -R /tmp/*')
os.environ['JOBLIB_TEMP_FOLDER'] = '/tmp'
pandarallel.initialize(nb_workers = int(os.cpu_count())-1, use_memory_fs = False , progress_bar=True,verbose=2 )
What is the performance hit from not using the memory fs?
Would be great if MEMORY_FS_ROOT in pandarallel.core could be overriden with an environment variable instead of being hardcoded. This mostly causes problems in docker / dev containers.
@chris-boson that seems reasonable. Would you be willing to make a pull request to that effect?
But in fact I checked the /tmp folder and it does not have too many files, the size is less than 10 Mb.
In the company env I actually do not have access to remove those files, I tried to set the os.environ['JOBLIB_TEMP_FOLDER'] or MEMORY_FS_ROOT to be somewhere else where there are a lot of space available, no luck either, did I miss anything?
Are there any recommendations available for what sizes of DataFrame we should or should not use_memory_fs for?