Problems with the <include> tag (document and implement safe and consistent behavior)
Describe the bug the
<include href="filename.xml">
results in input semantics different from what "including" the href'd file into the input would be and makes the idea of valid and easily described xml input worse.
To Reproduce
Here's how it actually works:
I must be a child of the inputfile's root node, It can be used to reference an XML file, this is then parsed as described below at the point in the QMCMain state graph at that point in the node by node child traversal of the main input root XML node. It can include anything as long as libxml2 can parse the document.
It then runs QMCMain::processPWH on the root node of the included document.
<simulation>
<qmcsystem>
<arbitrary>
This root node can be anything, the parsing of include file doesn't care or even examine this node in anyway. Any particle set, hamiltonian, wavefunction gets pushed to its pool, simulationcell overwrites particlepool's simulationcell and estimators node overwrites the global EstimatorManagerInput. every other child XML node in the included file's root just gets pushed into the QMCMain::qmc_action_ vector. Recall the qmc_action_ vector gets iterated over after the initial parse and if it contains a node that isn't qmc, optimize, loop, cmc, debug it just ignores it silently.
For the pools the defined element gets put in the map, but if the name i.e. key matches one already present it gets silently dropped, it does not replace the ParticleSet or whatever (unintuitive behavior of std::map). For the simulationcell it will get overwritten as will the global estimators will as well.
So it really matters where the include tag is in the main input file.
One thing the include file can't be is I would recognize as valid input i.e. at least if you were trying to get the pools from the included file. You would get the qmc sections.
<simulation>
<project ...>
...
</project>
<qmcsystem>
...
</qmcsystem>
<qmc ...>
</qm>
</simulation>
but the pool elements in qmcsystem would end up in qmc_action_ and be ignored.
Expected behavior Consistent behavior that encourages best practice input format. What system you end up simulating and what sections are run needs to be clear and not the result of behavior wrt tag sequence that drops some included sections, ignores others, and overwrites others still.
Why it matters, the labs are full of use of this tag. Our tutorials encourage users to use a potentially very confusing and broken way to specify input.
System: Any
Additional context This must have seemed like a clever hack delivering convenient functionality for a few lines of code. This is an example of a feature that we should only consider keeping by breaking existing user input.
Fortunately -- at least to my knowledge -- I have never heard of anyone being tripped up by the quirks of the include tag in the entire time I have used QMCPACK. The include tags are extensively used by the converters for pieces of the wavefunction, so they are in practice heavily used. The simulation will be wildly wrong if the user hits the deficiencies you have pointed out. => While not excusable, this is not an urgent problem. Error trapping known bad "routes" would be an easily merged upgrade and protect users.
As an aside, of the "historical problems" that still exist, I consider the lack of entropy in the random number initialization to be a more serious issue. (e.g. #1176 ) All of these "historical problems" they fall in the somewhat important but not urgent category. e.g. Some of the converters still need updating for the batched code (e.g. #5231 ), as well as other quality improvements related to relying more on defaults. Improving our defaults makes updating the converters easier since less needs to be specified.