lbann
lbann copied to clipboard
LBANN-core Conduit node data interface
This PR fixes the initial implementation of LBANN-core interface (it had been broken at some point since it's initial merge) and extends the interface to allow Conduit nodes to be passed from upstream applications.
Under the hood, a lot has changed:
- This PR relies on #1987 (commits included here)
- Loading a checkpointed model now also sets up a trainer, data coordinator, data reader, and data store (previously we tried avoid all of these)
- Samples from upstream applications can now be hydrogen matrices or conduit nodes. Once passed to LBANN, they are all stored in Conduit nodes in the data store
- If using hydrogen matrices, the interface only accepts a
std::map<std::string, {El::Matrix | El::DistMatrix}>where the string is thedata_fielddefined for the model input layer
- If using hydrogen matrices, the interface only accepts a
- A new
conduit_data_readerclass was added. It is very simple and contains only the essentials to make the data coordinator and data store work correctly
One last item that is not (yet) included:
- We should move
core_drivertoci_test/core_driver/to use the driver as a test for the LBANN-core API. I have added a simplerun.shwhich could be used to run the tests - but this will need to be fleshed out better, likely by @benson31