matex icon indicating copy to clipboard operation
matex copied to clipboard

Machine Learning Toolkit for Extreme Scale (MaTEx)

Results 6 matex issues
Sort by recently updated
recently updated
newest added

I tried emailing [email protected] (for general info, not a bug report), but got a delivery failure.

I got the following errors 2018-07-16 15:27:27.536541: W tensorflow/core/framework/op_kernel.cc:1192] Unknown: Exception: Message truncated, error stack: MPI_Allreduce(855)..................: MPI_Allreduce(sbuf=MPI_IN_PLACE, rbuf=0x2049aaa00, count=256, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLD) failed MPIR_Allreduce_impl(712)............: MPIR_Allreduce_intra(357)...........: MPIC_Sendrecv(186)..................: MPIDI_CH3U_Request_unpack_uebuf(599): Message truncated; 1536...

I tested the code from _/matex/src/deeplearning/tensorflow/examples/glibc_after_2.19/MNIST/tf_lenet3.py_ with command `python tf_lenet3.py` and I got an error: ``` Traceback (most recent call last): File "tf_lenet3.py", line 17, in mnist = tf.DataSet("MNIST", normalize=255.0)...

I ‘confused with the MPI Allreduce Operator the paper said that MaTEx-TensorFlow use the allreduce ops to synchronize each layer across ranks. I think one AllReduce op is to reduce...

1) We should make it pnetcdf linking optional for folks who want to use CSV or other file formats