QCEngine
QCEngine copied to clipboard
Complex, ordered multi-command input for Molpro
Is your feature request related to a problem? Please describe. Molpro input files are very complex, and depend on an imperative programming structure. For example, CCSD can only be performed after a HF command. This leads to very complicated input files, such as the following:
memory,300,m
symmetry,nosym
geometry=geo.xyz
basis=cc-pVDZ
gthresh,ENERGY=1e-10
{df-rhf,;save,2100.2}
{ibba,antibonds=1,THRLOC_IB=1e-12;freezecore;orbital,2100.2}
{locali,boys;print,orbital;fock,2100.2;order,type=fock}
{df-lccsd(t),chgfrac=1.0,canblk=0;core,6}
I want a way to specify this input file inside an AtomicInput
.
Describe the solution you'd like A format for specifying commands, in order. Each command must support args, kwargs, and directives. Directives themselves have args and kwargs. args and kwargs are not ordered. Directives are not ordered. (@sjrl please verify this last point).
One possible format would be:
keywords["commands"]: List[Commands]
class Directive:
name: str
args: Optional[List[Union[str,int,bool,float]]]
kwargs: Optional[Dict[str, Union[str,int,bool,float]]
class Command(Directive):
directives: Set[Directive]
Under this structure, the above input file would become:
method = "df-lccsd(t)"
basis = "cc-pVDZ" # note that this replaces the basis command
molecule = Molecule.from_file("geo.xyz")
keywords = {
"memory": 300,
"symmetry": "nosym",
"gthresh": "energy=1e-10", # gthresh is weird, and this is a lame solution
"commands": [
{"name": "df-rhf", "directives": {{"name": "save", "args": ["2100.2"]}}},
{
"name": "ibba",
"kwargs": {"antibonds": True, "thrloc_ib": 1e-12},
"directives": {
{"name": "freezecore"},
{"name": "orbital", "args": ["2100.2"]},
},
},
{
"name": "locali",
"args": ["boys"],
"directives": {
{"name": "print", "args": ["orbital"]},
{"name": "fock", "args": ["2100.2"]},
{"name": "order", "kwargs": {"type": "fock"}},
},
},
{
"name": "df-lccsd(t)",
"kwargs": {"chgfrac": 1.0, "canblk": 0},
"directives": {{"name": "core", "args": [6]},},
},
],
}
Describe alternatives you've considered
- Template files (current solution).
- dunder representation of nested dictionaries. Will lead to extremely long strings, and does not have the necessary ordered property of commands.
@loriab @sjrl
Directives are not ordered. (@sjrl please verify this last point).
That is correct, directives of a given command are not ordered. As a side tangent directives can be placed on new lines instead of the semicolon separation to partially alleviate the limit on the width of Molpro's input.
I like this idea a lot! I only have a few questions and comments.
First I think we can continue to handle symmetry
and memory
through the existing AtomicInput
(handled through molecule
and build_input()
respectively), which would only leave us with gthresh
and commands
. I like the commands
structure because we can easily check for the presence of that keyword to replace the method writing in build_input()
.
gthresh
is definitely a little trickier, but it is an example of Molpro directive that is specified without the need of a command. So we could potentially have a directives
keyword that lives outside of commands
and we could throw gthresh
into that. How does that sound?
I think directives
is a good solution to gthresh
. So keywords["directives"]: Optional[Set[Directive]]
?
So keywords["directives"]: Optional[Set[Directive]]?
Yup! So the only thing that this structure wouldn't support is the ability to set directives in between commands. In some complicated inputs (I believe @loriab has some examples) Molpro sets directives in between commands.
Here is a quick example for a PNO
calculation in Molpro
{df-hf}
{local,distp=8,maxit_lmp2=1,thrpno_en_lmp2=0.998,thrpno_occ_lmp2=0,thrdist=0,fitmos=0,fitpao=0,thrpno_dist=0.998,keep_diag=0}
{cfit,locfit=0}
{pno-lmp2,distp=0}
I guess in my above example, we could just use the commands
structure and do something like
"commands": [
{"name": "df-hf",},
{
"name": "local",
"kwargs": {"distp": 8, "maxit_lmp2": 1, "thrpno_en_lmp2": 0.998, ...}
},
{
"name": "cfit",
"kwargs": {"locfit": 0}
},
{
"name": "pno-lmp2",
"kwargs": {"distp": 0}
}
]
if you wanted to specify directives in between commands.
We could explicitly say keywords["commands"]: List[Commands, Directives]
just to make things abundantly clear.
Another question is whether method
should do anything other than just provide a convenient summary keyword. For example, we could require that the last command's name is the same as method
.
I would vote for method == commands[-1]["name"]
. This would make the return_result
make at least some sense.
@sjrl Any weird molpro edge cases that will break this? I'm okay with not supporting proc
, force
, weirdly-placed GDIRECT
, to name a few...
Hmm I do wonder, how is this meant to interact with the driver call? If the driver is gradient
does that mean force
should still be added after all commands
are written?
And for requiring method == commands[-1][name]
this would mean we would have methods like df-mp2
or ump2
, which have modifiers in front of the actual method. I believe we were trying to avoid this by having to specify keywords instead. But in the situation that commands
is provided is it alright to relax on this rule?
To clarify one more point. Does the presence of commands
mean that the input builder ignores all other keywords related to the method such as reference = unrestricted
?
Otherwise everything looks good to me! I can't think of any other edge cases and I agree proc
and GDIRECT
can be left alone.
Hmm I do wonder, how is this meant to interact with the driver call? If the driver is
gradient
does that meanforce
should still be added after allcommands
are written?
Yes. Would this also work for hessians?
And for requiring
method == commands[-1][name]
this would mean we would have methods likedf-mp2
orump2
, which have modifiers in front of the actual method. I believe we were trying to avoid this by having to specify keywords instead. But in the situation thatcommands
is provided is it alright to relax on this rule?
After looking at various programs (and especially GAMESS), @dgasmith and I have moved toward letting method
just be whatever the program wants it to be called. So if Q-Chem calls density-fitted MP2 "RI-MP2", and Molpro calls it "df-rmp2", and entos calls it simply "mp2", then that's okay. It's too much work -- and potentially impossible -- to make the methods program-independent. The only place I would be careful is that I would like to normalize known aliases within a program. For example, "HF" and "RHF" do the same thing in Molpro.
To clarify one more point. Does the presence of
commands
mean that the input builder ignores all other keywords related to the method such asreference = unrestricted
?
Non-command keywords would get translated to key=value
at the top of the file. AFAIK reference
is a special keyword present only in MolproHarness
's build_input
and would be removed in favor of specifying spin ansatz in the command name.
Otherwise everything looks good to me! I can't think of any other edge cases and I agree
proc
andGDIRECT
can be left alone.
👍
Would this also work for hessians?
I believe so, although I don't have much experience with hessian calculations in Molpro.
After looking at various programs (and especially GAMESS), @dgasmith and I have moved toward letting method just be whatever the program wants it to be called. So if Q-Chem calls density-fitted MP2 "RI-MP2", and Molpro calls it "df-rmp2", and entos calls it simply "mp2", then that's okay. It's too much work -- and potentially impossible -- to make the methods program-independent. The only place I would be careful is that I would like to normalize known aliases within a program. For example, "HF" and "RHF" do the same thing in Molpro.
Okay gerat! That all sounds good to me.
Non-command` keywords would get translated to key=value at the top of the file. AFAIK reference is a special keyword present only in MolproHarness's build_input and would be removed in favor of specifying spin ansatz in the command name.
That is correct reference
is something I added back when ump2
was not preferred. I'm happy to have that removed!
Okay, it looks like we've reached consensus on this.
Yeah, sorry, I went to work on these new ideas last week, then got distracted by other projects. I don't have a final proposal, so here are some points.
-
definitely agree that need another construct for ordered commands
-
After looking at various programs (and especially GAMESS), @dgasmith and I have moved toward letting method just be whatever the program wants it to be called. So if Q-Chem calls density-fitted MP2 "RI-MP2", and Molpro calls it "df-rmp2", and entos calls it simply "mp2", then that's okay. It's too much work -- and potentially impossible -- to make the methods program-independent. The only place I would be careful is that I would like to normalize known aliases within a program. For example, "HF" and "RHF" do the same thing in Molpro.
I'll interject that I see this as two issues. I totally agree that it's not qcng/qcsk's role to make the methods program-independent -- that is, mp2 in prog A is same mtd/answer as mp2 in prog B. I do see it as a proper role to defend the data layout of QCSchema that only method info appears in
model.method
and that routing info (e.g., tce) or algorithm info (e.g., ri) be slotted in the grab bag that is keywords (or in an extras prefix/suffix field of model maybe if that's convenient for reconstituting). -
here's another molpro input to consider
GTHRESH,ZERO=1.e-14,ONEINT=1.e-14,TWOINT=1.e-14,ENERGY=1.e-8,ORBITAL=1.e-8,GRID=1.e-8
basis={
set,orbital; default,aug-cc-pVTZ
set,jkfit; default,aug-cc-pVTZ/jkfit
set,dflhf; default,aug-cc-pVTZ/jkfit
set,mp2fit; default,aug-cc-pVTZ/mp2fit
}
gdirect
ga=1101.2; gb=1102.2
ca=2101.2; cb=2102.2
dummy,14,15,16,17,18,19,20,21,22,23,24,25,26
{df-hf,basis=jkfit,locorb=0; start,atdens; save,$ga}
{df-ks,lhf; dftfac,1.0; start,$ga; save,$ca}
eehfa=energy; sapt;monomerA
dummy,1,2,3,4,5,6,7,8,9,10,11,12,13
{df-hf,basis=jkfit,locorb=0; start,atdens; save,$gb}
{df-ks,lhf; dftfac,1.0; start,$gb; save,$cb}
eehfb=energy; sapt;monomerB
{sapt,SAPT_LEVEL=3;intermol,ca=$ca,cb=$cb,icpks=0,fitlevel=3,nlexfac=0.0,cfac=0.0
dfit,basis_coul=jkfit,basis_exch=jkfit,cfit_scf=3}
eedisp=E2disp
- Below is my rough expression of the above in "dunder" format. An obvious weakness is the handling of options associated with repeated commands.
# commands: [
# gthresh,
# gdirect,
# {ga: 1101.2},
# {gb: 1102.2},
# {ca: 2101.2},
# {cb: 2102.2},
# dummy,
# df-hf,
# df-ks,
# {eehfa: energy}
# sapt
GTHRESH
,ZERO=1.e-14 # gthresh__zero: 1.e-14
,ONEINT=1.e-14 # gthresh__oneint: 1.e-14
,TWOINT=1.e-14 # gthresh__twoint: 1.e-14
,ENERGY=1.e-8 # gthresh__energy: 1.e-8
,ORBITAL=1.e-8 # grhresh__orbital: 1.e-8
,GRID=1.e-8 # gthresh__grid: 1.e-8
basis={
set,orbital; default,aug-cc-pVTZ
set,jkfit; default,aug-cc-pVTZ/jkfit
set,dflhf; default,aug-cc-pVTZ/jkfit
set,mp2fit; default,aug-cc-pVTZ/mp2fit
}
gdirect
ga=1101.2; gb=1102.2
ca=2101.2; cb=2102.2
dummy # dummy: [14, 15, 16, ... 26]
,14
,15
,16,17,18,19,20,21,22,23,24,25,26
{df-hf
,basis=jkfit # see below
,locorb=0
; start
,atdens
; save
,$ga}
{df-ks # see below
,lhf
; dftfac
,1.0
; start
,$ga
; save
,$ca}
eehfa=energy; sapt;monomerA
dummy,1,2,3,4,5,6,7,8,9,10,11,12,13
{df-hf
,basis=jkfit # df-hf__basis: jkfit
,locorb=0 # df-hf__locorb: 0
;start
,atdens # df-hf___start: atdens # or [atdens]
;save
,$gb} # df-hf___save: $gb
{df-ks
,lhf # df-ks__
;dftfac
,1.0 # df-ks___dftfac: 1.0 # or [1.0]
;start
,$gb # df-ks___start: $gb
;save
,$cb} # df-ks___save: $cb
eehfb=energy; sapt;monomerB
{sapt
,SAPT_LEVEL=3 # sapt__sapt_level: 3
;intermol
,ca=$ca # sapt___intermol__ca: $ca
,cb=$cb # sapt___intermol__cb: $cb
,icpks=0 # sapt___intermol__icpks: 0
,fitlevel=3 # sapt___intermol__fitlevel: 3
,nlexfac=0.0 # sapt___intermol__nlexfac: 0.0
,cfac=0.0 # sapt___intermol__cfac: 0.0
;dfit # sapt___dfit__basis_coul: jkfit
,basis_coul=jkfit # sapt___dfit__basis_exch: jkfit
,basis_exch=jkfit # sapt___dfit__cfit_scf: 3
,cfit_scf=3}
eedisp=E2disp
- Sorry to be pushing back a bit against the perfectly good repr of complex inputs proposed in this issue. I worry that it violates the below from the infamous #70. By expressing the command and full options set all together, it makes it hard to specify or change a single option using a syntax that works across a variety of programs.
- Keywords should be independent and granular such that they're approximately 1:1 with other programs, not 5:1. That is, no single method option should cover convergence, algorithm, and target roots all wrapped together. Also, separate, yet mutually exclusive, options sets are a last resort (e.g., three independent boolean options for rhf/uhf/rohf).
from above
"commands": [
{"name": "df-hf",},
{
"name": "local",
"kwargs": {"distp": 8, "maxit_lmp2": 1, "thrpno_en_lmp2": 0.998, ...}
},
{
"name": "cfit",
"kwargs": {"locfit": 0}
},
{
"name": "pno-lmp2",
"kwargs": {"distp": 0}
}
]
I presume the user in your plan enters model.method = 'pno-lmp2'
, and the df-hf, cfit, and pno-lmp2 commands+directives get filled in by the write_input
in the harness. If the user wants to tweak maxit_lmp2
(or some other piece they're methodologically allowed to tweak), do they have a whole
{
"name": "local",
"kwargs": {"maxit_lmp2": 2}
},
section? Is the harness prepared to handle that that's not the full directive but a delta?
I think the plan for the above use of the commands
keyword is to completely replace the model.method
when writing the input file. So a kind of replacement for current template implementation for the Molpro Harness. And for the very good point you bring up with issue https://github.com/MolSSI/QCEngine/issues/70 I don't see how we can satisfy this goal and also have templates. But perhaps I am missing something?
The delta example you provide I believe can be made a separate task where the commands
keyword is not provided, and then we rely on write_input
on knowing how to handle the method pno-lmp2
and what keywords it supports.