Brian Van Essen
Brian Van Essen
Add a feature to enable the checkpoint logic to only write one out when the metric has improved.
Dump gradients and error signals should be updated to use the trainer-safe naming scheme.
https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html
Update the C&R test to include permutations of the optimizers.
Callbacks are all considered to be stateless from a C&R POV. This needs to be addressed.
Moving the old prototext based application models into a legacy directory. New applications will be based on the python front end and have a new directory structure that groups together...
Have the CI also rebuild and install a version of LBANN on LC for users to point at.
The RNG state needs to be made model specific, so that in tests like lbann2 the order in which we initialize the models should not impact their current or future...
It could be useful to add a method like: `virtual std::unique_ptr get_prototype_execution_context() const = 0;` Then in derived classes implement, e.g., ```c++ std::unique_ptr sgd_training_algorithm::get_prototype_execution_context() const { return make_unique(); } ```...
The proto layer graph constructor should not require the trainer, but it wants the number of parallel readers for the input layers. Once the data reader moves, we can remove...