Tim Moon

Results 231 comments of Tim Moon

An hour seems really excessive for LeNet. I suspect something is hanging. It's odd, since it should just run with one MPI rank if you don't pass in extra arguments....

I don't see the debug callback in the log. At the line I gave you, we configure the model with three callbacks to print the model description, metrics, and times....

I'm not too familiar with the build system and the main developer is on vacation for the rest of the week, but I'll give it a shot. Is your Spack...

Your setup looks sensible to me. In my workflow I build the dependencies in a Spack environment and build LBANN with CMake, and I just need to load one modulefile...

Exceptions thrown by `LBANN_ERROR` can be caught, but it's annoying since they still print out error messages (see #1123).

It sounds like we want to make the I/O thread pool a singleton object.

@benson31 posted in #916: > Thread pools are not singletons. That would be a bad move. There can be many thread pools per rank/node/whatever. I/O is the shared resource. IMO,...

Pylint is supposed to be helpful for catching Python bugs, although it apparently has a ton of false positives. For future reference, [Google's config file](https://github.com/google/seq2seq/blob/master/pylintrc).

I think this is a good approach. Also, this is probably a good place to litigate whether we should call it `termination_critera` vs. `termination_criterion`. From the discussion in #916: >...

I wonder if this is the right way to ensure that the checkpoint tests proceed deterministically. Couldn't we also do it by avoiding usage of the RNG state entirely (by...