mpi_learn
mpi_learn copied to clipboard
Distributed learning with mpi4py
Introduces the check on the keras image data format flag to set the best channels ordering according to the hardware architecture: channels_first for CPU and channels_last for GPU
@svalleco has raised a good point that it would be beneficial to have --n-fold option on mpidriver.py / mpigdriver.py The only change I can see is to use the kfoldmanager...
https://github.com/vlimant/mpi_learn/blob/master/mpi_learn/mpi/manager.py#L29 in `def get_device(comm, num_masters=1, gpu_limit=-1, gpu_for_master=False):` gpu_for_master is not used anymore, and I think it is hard to know at that point whether the global rank is a "master"...
with mpirun -tag-output -n 5 python3 MPIDriver.py topclass_torch_arch.torch train_topclass.list test_topclass.list --loss categorical_crossentropy --epochs 10 --torch --features-name Images --labels-name Labels it gets no sensible loss values [1,0]:loss: -506.394 [1,0]:acc: 0.333
in the way the master receives the "update" from the workers, for the bn mean weight (the running mean) it would consider the diff as a gradient and do something...