improved-diffusion icon indicating copy to clipboard operation
improved-diffusion copied to clipboard

Query: Distributed learning

Open KomputerMaster64 opened this issue 3 years ago • 6 comments

I am trying to implement the improved-ddpm project using google colab (1 GPU), I am not sure how to rectify the distributed training problem coming because of the following code

Traceback (most recent call last):
  File "image_train.py", line 7, in <module>
    from improved_diffusion import dist_util, logger
  File "/content/gdrive/MyDrive/Colab Notebooks/GitHub Repositories/improved-diffusion/improved_diffusion/dist_util.py", line 10, in <module>
    from mpi4py import MPI
ModuleNotFoundError: No module named 'mpi4py'

KomputerMaster64 avatar Jul 13 '22 13:07 KomputerMaster64

Just install mpi4py python package (pip install mpi4py), and you may also need to install libmpich-dev (apt install libmpich-dev) before that.

taoisu avatar Jul 23 '22 16:07 taoisu

I am still facing issue. I will search more about it. The following is the output: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. requests 2.23.0 requires urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you have urllib3 1.26.11 which is incompatible. datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.

KomputerMaster64 avatar Jul 25 '22 20:07 KomputerMaster64

@KomputerMaster64 you can try with "conda install -c conda-forge mpi4py" it solved the issue for me

muhamadusman avatar Dec 07 '22 14:12 muhamadusman

@muhamadusman May i know your mpi4py version?

I got the following error when training the model:

ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

JunMa11 avatar Dec 14 '22 01:12 JunMa11

I got the following error when training the model:

ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

@JunMa11 Hi, have you solved this problem? I just got the same error.

adahfbch avatar Dec 15 '23 12:12 adahfbch