foyer icon indicating copy to clipboard operation
foyer copied to clipboard

Parallelizing option for foyer

Open daico007 opened this issue 5 years ago • 5 comments
trafficstars

Describe the behavior you would like added to Foyer Atomtyping and parametrizing large system can be slow, so I propose that we parallelize some process in foyer that can hopefully speed up the process. The two places I think parallelization can be apply is the atomtyping step (in atomtyper.py) and the parametrization step (in forcefield.py, the parametrize method of the Forcefield class).

Describe the solution you'd like Add option for user to parallelize the processes mentioned above.

daico007 avatar Mar 23 '20 16:03 daico007

Within the foyer API, this gets hard because methods like parametrize_system all operate on the same object so shared memory options might be difficult to parallelize.

For something like run_atomtyping, you could imagine trying to split up your entire chemical system (pmd.Structure) into individual molecules (pmd.Structure) and distributing those structures across threads, processes, or workers. With the residue map, you could try to send the same molecules to the same worker. Or something like dask to handle distributing workloads

Regardless it could end up looking hairy

ahy3nz avatar Jun 17 '20 22:06 ahy3nz

Apologies if this information is readily available, but have we done any profiling? What are the limiting steps?

rsdefever avatar Jun 18 '20 00:06 rsdefever

In general it's parmed that causes big systems to type slowly. Finding the atom types is pretty quick, but actually populating the parmed system with the parameters can be slow (even sometimes memory limited). There's a figure in the 2019 paper (fig.2, I think) that demonstrates linear scaling over a decent range of sizes for a chemically simple system. I have some old notebooks that tried to make this systematic for different types of systems (highly bonded, not at all bonded, large XMLs, small XMLs, etc.), but the results were unsurprising.

On Wed, Jun 17, 2020, 7:16 PM Ryan S. DeFever [email protected] wrote:

Apologies if this information is readily available, but have we done any profiling? What are the limiting steps?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mosdef-hub/foyer/issues/331#issuecomment-645692499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RLFVKIHYEKJEX6TFMSC3RXFMENANCNFSM4LSAHUMQ .

mattwthompson avatar Jun 18 '20 00:06 mattwthompson

@mattwthompson thanks for the insight. Do you think thats something we can improve on when we replace parmed with our own backend? Also good to know we should keep an eye out for performance there as we go about figuring that out.

rsdefever avatar Jun 18 '20 01:06 rsdefever

Yes, and splitting out the two logical steps of Forcefield.apply should probably be the first step to being able to refactor for performance.

mattwthompson avatar Jun 18 '20 01:06 mattwthompson