lbann
lbann copied to clipboard
LBANN: Spatial Parallelism
I would like to evaluate a model-parallel application using the spatial parallelism support described in the LBANN publication: (Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism).
However, I don't see any instructions or code snippets here for running LBANN with spatial parallelism.
Is spatial parallelism currently open-source and on this repo? Are there any instructions or documents for enabling and running spatial parallelism with LBANN?
Documentation remains to be worked on, but pretty much every code for spatial parallelism is publicly available. The main component of parallel convolutions exists in a separate library, DiHydrogen, which is used from LBANN when enabled. DiHydrogen is available at https://github.com/LLNL/DiHydrogen (see the legacy directory).
We don't have specific documentations for spatial parallelism yet, however, once you successfully build and run LBANN, additional steps for using spatial parallelism are minor.
Do you have specific models with which you want to try spatial parallelism? If so, the first step would be to run the model on LBANN without spatial parallelism.