pandarallel icon indicating copy to clipboard operation
pandarallel copied to clipboard

OSError: [Errno 28] No space left on device

Open borisRa opened this issue 4 years ago • 13 comments

Hi,

Hi, How can I prevent the space error bellow ?

Traceback (most recent call last):
  File "/anaconda/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/anaconda/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/anaconda/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 64, in global_worker
    return _func(x)
  File "/anaconda/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 120, in wrapper
    pickle.dump(result, file)
OSError: [Errno 28] No space left on device

Thanks ! Boris

borisRa avatar Jan 04 '21 16:01 borisRa

Hi,

I guess this issue is unrelated to pandarallel. It seems your hard drive is simply full. So you may remove some files from you drive...

nalepae avatar Jan 04 '21 22:01 nalepae

It happens only when I am using pandarallel on relatively big data frame . This config does work : pandarallel.initialize(use_memory_fs = False ) Found some related answer here and here

How can I set the temp folder to : /tmp when using pandarallel ?

borisRa avatar Jan 05 '21 10:01 borisRa

Same issue with a large dataframe, likely seems to be occurring because /dev/shm is full.

abhineetgupta avatar Feb 16 '21 18:02 abhineetgupta

If working with Docker, the default for /dev/shm is only 64M.

pankaj-kvhld avatar Feb 18 '21 12:02 pankaj-kvhld

This is the solution:

import os

os.environ['JOBLIB_TEMP_FOLDER'] = '/tmp' pandarallel.initialize(nb_workers=30,progress_bar=True,verbose=2,use_memory_fs = False )

pbidro avatar Mar 06 '21 03:03 pbidro

In my case I also had to clean up the /tmp folder.

lucaspetry avatar Feb 09 '22 15:02 lucaspetry

T

This is the solution:

import os

os.environ['JOBLIB_TEMP_FOLDER'] = '/tmp' pandarallel.initialize(nb_workers=30,progress_bar=True,verbose=2,use_memory_fs = False )

Thank you very much !

borisRa avatar Mar 01 '22 07:03 borisRa

In my case I also had to clean up the /tmp folder.

Thanks ! combined the both solutions .

Like this : import os,shutil

#clean up the /tmp folder if os.path.isdir("/tmp") :    os.system('rm -R /tmp/*')

os.environ['JOBLIB_TEMP_FOLDER'] = '/tmp'
pandarallel.initialize(nb_workers = int(os.cpu_count())-1, use_memory_fs = False , progress_bar=True,verbose=2 )

borisRa avatar Mar 01 '22 08:03 borisRa

What is the performance hit from not using the memory fs?

florianlaws avatar May 16 '22 06:05 florianlaws

Would be great if MEMORY_FS_ROOT in pandarallel.core could be overriden with an environment variable instead of being hardcoded. This mostly causes problems in docker / dev containers.

chris-boson avatar Jan 30 '23 14:01 chris-boson

@chris-boson that seems reasonable. Would you be willing to make a pull request to that effect?

till-m avatar Jan 30 '23 14:01 till-m

But in fact I checked the /tmp folder and it does not have too many files, the size is less than 10 Mb.

In the company env I actually do not have access to remove those files, I tried to set the os.environ['JOBLIB_TEMP_FOLDER'] or MEMORY_FS_ROOT to be somewhere else where there are a lot of space available, no luck either, did I miss anything?

Realvincentyuan avatar Jun 02 '23 15:06 Realvincentyuan

Are there any recommendations available for what sizes of DataFrame we should or should not use_memory_fs for?

Koen1999 avatar Jan 18 '24 10:01 Koen1999