lbann icon indicating copy to clipboard operation
lbann copied to clipboard

LBANN-core Conduit node data interface

Open mrwyattii opened this issue 3 years ago • 0 comments

This PR fixes the initial implementation of LBANN-core interface (it had been broken at some point since it's initial merge) and extends the interface to allow Conduit nodes to be passed from upstream applications.

Under the hood, a lot has changed:

  • This PR relies on #1987 (commits included here)
  • Loading a checkpointed model now also sets up a trainer, data coordinator, data reader, and data store (previously we tried avoid all of these)
  • Samples from upstream applications can now be hydrogen matrices or conduit nodes. Once passed to LBANN, they are all stored in Conduit nodes in the data store
    • If using hydrogen matrices, the interface only accepts a std::map<std::string, {El::Matrix | El::DistMatrix}> where the string is the data_field defined for the model input layer
  • A new conduit_data_reader class was added. It is very simple and contains only the essentials to make the data coordinator and data store work correctly

One last item that is not (yet) included:

  • We should move core_driver to ci_test/core_driver/ to use the driver as a test for the LBANN-core API. I have added a simple run.sh which could be used to run the tests - but this will need to be fleshed out better, likely by @benson31

mrwyattii avatar Jan 25 '22 21:01 mrwyattii