datacomp icon indicating copy to clipboard operation
datacomp copied to clipboard

FileNotFoundError while downloading DataComp-1B

Open xfgao opened this issue 1 year ago • 7 comments

Thanks for the great work. I encountered the following issue while downloading the DataComp-1B dataset:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

xfgao avatar Jul 07 '23 03:07 xfgao

Hi @xfgao!

To help identify the issue, could you share the command with which you are running download_upstream.py? Also, is the above the full error message?

Thanks!

GeorgiosSmyrnis avatar Jul 07 '23 07:07 GeorgiosSmyrnis

We were running the following command to download data: python download_upstream.py --scale datacomp_1b --data_dir DATA_DIR We were able to download all the metadata and a bunch of tar files at the beginning, but after a certain point we keep getting the error message:

Traceback (most recent call last):
 File "<string>", line 1, in <module>
 File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
   exitcode = _main(fd, parent_sentinel)
 File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
   self = reduction.pickle.load(from_parent)
 File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
   self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

xfgao avatar Jul 07 '23 17:07 xfgao

The error seems to be a limit at the OS level on the number of thread you can open

Either decrease the number of thread you are using, increase the limit or use a different machine

If you provide some info on your environment it could help

rom1504 avatar Jul 08 '23 12:07 rom1504

Thanks for the response. For the data downloading, I'm using Ubuntu 20.04 on an AWS EC2 g5.12xlarge instance (with 48 cpu cores). After reducing the processes_count to 8 and thread_count to 8, I'm still getting the the same FileNotFoundError error.

xfgao avatar Jul 10 '23 17:07 xfgao

Can you try using virtual env instead of conda?

rom1504 avatar Jul 10 '23 17:07 rom1504

Do we have a requirement.txt file for setting up virtual env?

xfgao avatar Jul 10 '23 19:07 xfgao

You can try installing the packages listed under pip in the environment.yml, if I am not mistaken it should achieve something similar to the desired environment if your system python is the correct version (although this needs to be verified). You should still train with the original environment to avoid other issues - but for just the data download it should be fine.

GeorgiosSmyrnis avatar Jul 11 '23 20:07 GeorgiosSmyrnis