BioSimSpace icon indicating copy to clipboard operation
BioSimSpace copied to clipboard

Use pandas.DataFrame in BSS.Protocol.FreeEnergy

Open xiki-tempula opened this issue 3 years ago • 16 comments

Is your feature request related to a problem? Please describe. I'm interested in using BSS to implement Gromacs absolute binding free energy calculations. So it would be nice if I could set the bonded-lambda, coul-lambda and vdw-lambda at the same time in BSS.Protocol.FreeEnergy._lambda_vals.

However, currently the lambda values are stored as a list, which means we cannot have more than one lambda value.

Describe the solution you'd like I'm interested in changing the BSS.Protocol.FreeEnergy._lambda_vals to a pandas dataframe. So one could have a different columns for bonded-lambda, coul-lambda and vdw-lambda. If nothing is specified, it will just be a dataframe with a single column of fep

I wonder what is your thought on this and the potential impact on other modules.

xiki-tempula avatar May 10 '22 10:05 xiki-tempula

I haven't used SOMD but I guess if we have this as pandas.DataFrame all these

                "discharge_soft" : Perturb all discharging soft atom charge terms (i.e. value->0.0).
                "vanish_soft" : Perturb all vanishing soft atom LJ terms (i.e. value->0.0).
                "flip" : Perturb all hard atom terms as well as bonds/angles.
                "grow_soft" : Perturb all growing soft atom LJ terms (i.e. 0.0->value).
                "charge_soft" : Perturb all charging soft atom LJ terms (i.e. 0.0->value).

Could be a column in the pd.DataFrame

discharge_soft is 'coul': [1.0, ..., 0.0] vanish_soft is 'vdw': [1.0, ..., 0.0] flip is 'bonded': [0.0, ..., 1.0] grow_soft is 'vdw': [0.0, ..., 1.0] charge_soft is 'coul': [0.0, ..., 1.0]

xiki-tempula avatar May 10 '22 10:05 xiki-tempula

Hi there,

At present we don't provide general support for alternative lambda schedules with FEP, but is something that we'd like to support in future.

As you've noted, we do have a SOMD specific option in Protocol.FreeEnergy, i.e. perturbation_type, that partially supports this, although the way in which it is implemented behind the scenes is specific to SOMD. It would be easy enough to add some additional options to be able to set the additional lambda values, e.g. by handling different types for the lam_vals option, e.g. it could be list, a list of lists/tuples, a dictionary, or a data-frame, as you suggest.

I like your suggestion of mapping the existing named perturbation types to the data frame (or whatever) entries, which would provide a quick way for the user to set things up, i.e. by just choosing a number of lambda values and the type of perturbation.

My main concern is how these things are actually implemented within each engine and if there are any other specifics that we need to be aware of, e.g. for RBFE can we just continue to use the same setup procedure for GROMACS if we choose to support these additional options, or do we need to tweak other settings, or even the perturbed topology? Also, how do things work with, say, AMBER? Do the same options translate?

Thanks for your input. It would be good to open a more general discussion on this since it would be nice to make things as flexible as possible going forward.

Cheers.

lohedges avatar May 10 '22 11:05 lohedges

@lohedges So for this DataFrame, I was thinking of having column names as fep, coul, vdw, bonded, (restraint, mass, temperature, if needed) https://manual.gromacs.org/documentation/current/user-guide/mdp-options.html#mdp-fep-lambdas.

for RBFE can we just continue to use the same setup procedure for GROMACS if we choose to support these additional options, or do we need to tweak other settings, or even the perturbed topology?

For Gromacs then this will map directly to the corresponding XXX-lambdas, if the default list is being given, they will just be mapped to fep-lambdas. So for Gromacs this won't be a problem.

Also, how do things work with, say, AMBER? Do the same options translate?

To be honest, I don't know but I know that you could support splitting the coul and vdw in amber. Bonded is treated differently in amber but I guess this is an issue one need to address eventually.

I guess, we could make it that if a list is given, it will just be mapped to column fep, then the current input and output could be preserved.

xiki-tempula avatar May 10 '22 13:05 xiki-tempula

The ways the lambda schedule is split is fairly engine specific. For SOMD you do that by setting up separate pert files. I think you could have a high level config option 'split' or 'concerted' and then map decisions to specific engine specific protocols. That is not offering the ability to implement the same lambda schedule across different engines (which would be meaningless due to other engine specific settings), but offering the ability to decide on a type of simulation methodology to use with a given engine. Provided the default protocols are sensibly chosen by an expert in a given code, and that power users have the ability to override defaults that should be ok.

jmichel80 avatar May 10 '22 13:05 jmichel80

@lohedges I have implemented the new class based on the BSS.Protocol.FreeEnergy() and I'm interested in testing it. I wonder if do you mind help me to find where is the unit test for the BSS.Protocol? I cannot seems to find it in the test. Thank you.

xiki-tempula avatar May 11 '22 14:05 xiki-tempula

There aren't any at present. Since the exsting Protocol classes are so basic they are tested implicitly through the tests in test/Process, i.e. we check that each protocol can be run for all of the supported engines. (The complexity comes in translating the protocol options into configuration file parameters and whatnot.) While simple, the Protocol classes do have robust setters that validate any input passed in by the user.

However, as the logic in the protocols becomes more complex then it will be good to test them in isolation. Please feel free to add any appropriate tests in a new test/Protocol directory within your fork, following the conventions of the other tests, as described here.

Just to check... If you are working for Exscientia, it would be good if you add experimental features and tests in the new sandpit area that I've set up, as described here. It might be worth chatting with @msuruzhon if you are unsure about this.

Cheers.

lohedges avatar May 11 '22 14:05 lohedges

@lohedges Yes, this is the plan. Thank you for the explanation.

xiki-tempula avatar May 11 '22 14:05 xiki-tempula

No problem. If working in the sandpit then tests should go in, e.g. test/Sandpit/Exscientia/Protocol. Just remember to import from the appropriate sandpit within the test scripts, e.g.: import BioSimSpace.Sandpit.Exscientia as BSS. This will keep the tests for the sanpdit and devel isolated, and would allow them to be run together or separately, as desired.

lohedges avatar May 11 '22 15:05 lohedges

@lohedges Thanks. I wonder if the Sandpit is ready to be working on now? I think this concept was mentioned the day before yesterday and I was told to hold off until it is ready.

xiki-tempula avatar May 11 '22 15:05 xiki-tempula

Ah yes, probably hold off until the PR is merged. It should be easy enough for you to copy your working files into the sandpit once it is ready.

lohedges avatar May 11 '22 15:05 lohedges

@lohedges Thanks. Do you mind ping me when it is ready? Thank you.

xiki-tempula avatar May 11 '22 16:05 xiki-tempula

Yes, will do.

lohedges avatar May 11 '22 16:05 lohedges

The sandpit has now been merged. Once @msuruzhon has set up your fork you will be able to work within the sandpit space. (Either from your own development branch, or separate feature branches as appropriate.)

lohedges avatar May 12 '22 08:05 lohedges

@lohedges Thanks.

xiki-tempula avatar May 12 '22 08:05 xiki-tempula

@fjclark I think it might be better to take the discussion here as it is really difficult to follow long discussions in the google docs.

Yes I do agree that concatenating the BSS.Protocol.FreeEnergy is a good approach if one were to do the transformations in stages.

bonded-lambda = 0.0 0.5 1.0 1.0 1.0 1.0 1.0
coul-lambda   = 0.0 0.0 0.0 0.5 1.0 1.0 1.0
vdw-lambda    = 0.0 0.0 0.0 0.0 0.0 0.5 1.0

However, it might be not so straight forward to cope with the case of

bonded-lambda = 0.0 0.1 0.3 0.5 1.0 1.0 1.0
coul-lambda   = 0.0 0.0 0.0 0.5 1.0 1.0 1.0
vdw-lambda    = 0.0 0.0 0.0 0.0 0.0 0.5 1.0

where we wish to have some overlap to have save some windows. But this might be something that is engine specific. We could have three protocol as [0.0, 0.1, 0.3, 0.5, 1.0, 1.0, 1.0], [0.0, 0.0, 0.0, 0.5, 1.0, 1.0, 1.0] and [0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 1.0].

I was thinking of having BSS.Protocol.FreeEnergy._lambda_vals as pandas DataFrame, then all of the lambda information could be stored and the current BSS.Protocol.FreeEnergy will be a special case where the pandas DataFrame only has one column.

xiki-tempula avatar May 12 '22 16:05 xiki-tempula

@xiki-tempula definitely. Thanks for clarifying - I hadn't seen this issue.

fjclark avatar May 13 '22 09:05 fjclark