mpi_learn icon indicating copy to clipboard operation
mpi_learn copied to clipboard

Distributed learning with mpi4py

Results 5 mpi_learn issues
Sort by recently updated
recently updated
newest added

Introduces the check on the keras image data format flag to set the best channels ordering according to the hardware architecture: channels_first for CPU and channels_last for GPU

@svalleco has raised a good point that it would be beneficial to have --n-fold option on mpidriver.py / mpigdriver.py The only change I can see is to use the kfoldmanager...

https://github.com/vlimant/mpi_learn/blob/master/mpi_learn/mpi/manager.py#L29 in `def get_device(comm, num_masters=1, gpu_limit=-1, gpu_for_master=False):` gpu_for_master is not used anymore, and I think it is hard to know at that point whether the global rank is a "master"...

with mpirun -tag-output -n 5 python3 MPIDriver.py topclass_torch_arch.torch train_topclass.list test_topclass.list --loss categorical_crossentropy --epochs 10 --torch --features-name Images --labels-name Labels it gets no sensible loss values [1,0]:loss: -506.394 [1,0]:acc: 0.333

in the way the master receives the "update" from the workers, for the bn mean weight (the running mean) it would consider the diff as a gradient and do something...