mpi_learn issues

Add channel ordering option + fix for k2 bug

Introduces the check on the keras image data format flag to set the best channels ordering according to the hardware architecture: channels_first for CPU and channels_last for GPU

svalleco

k-folding in training

@svalleco has raised a good point that it would be beneficial to have --n-fold option on mpidriver.py / mpigdriver.py The only change I can see is to use the kfoldmanager...

vlimant

https://github.com/vlimant/mpi_learn/blob/master/mpi_learn/mpi/manager.py#L29 in `def get_device(comm, num_masters=1, gpu_limit=-1, gpu_for_master=False):` gpu_for_master is not used anymore, and I think it is hard to know at that point whether the global rank is a "master"...

vlimant

torch topclass example does not converge

with mpirun -tag-output -n 5 python3 MPIDriver.py topclass_torch_arch.torch train_topclass.list test_topclass.list --loss categorical_crossentropy --epochs 10 --torch --features-name Images --labels-name Labels it gets no sensible loss values [1,0]:loss: -506.394 [1,0]:acc: 0.333

vlimant

master consideres batchnorm mean/std update as gradients

2

in the way the master receives the "update" from the workers, for the bn mean weight (the running mean) it would consider the diff as a gradient and do something...

vlimant

mpi_learn
mpi_learn copied to clipboard

Metadata

Add channel ordering option + fix for k2 bug

k-folding in training

--master-gpu has no effect

torch topclass example does not converge

master consideres batchnorm mean/std update as gradients

← Metadata

Owner

Metadata

mpi_learn mpi_learn copied to clipboard

Metadata

Add channel ordering option + fix for k2 bug

k-folding in training

--master-gpu has no effect

torch topclass example does not converge

master consideres batchnorm mean/std update as gradients

← Metadata

Owner

Metadata

mpi_learn
mpi_learn copied to clipboard