TraceManager fails when run is not serial
Describe the bug When traces are active, only serial run finishes, otherwise the program crashes.
vmc.xml puts 1 up and 1 down electron into a box at rs=30 with the uniform wavefunction.
<simulation>
<project id="qmc"/>
<qmcsystem>
<simulationcell>
<parameter name="lattice" units="bohr">
61.39960247678931 0.0 0.0
0.0 61.39960247678931 0.0
0.0 0.0 61.39960247678931
</parameter>
<parameter name="bconds">p p p</parameter>
</simulationcell>
<particleset name="e" random="yes">
<group name="u" size="1">
<parameter name="charge">-1</parameter>
</group>
<group name="d" size="1">
<parameter name="charge">-1</parameter>
</group>
</particleset>
<wavefunction name="psi0" target="e"/>
<hamiltonian name="h0" type="generic" target="e">
<pairpot name="ElecElec" type="coulomb" source="e" target="e"/>
</hamiltonian>
</qmcsystem>
<traces array="yes" write="yes"/>
<qmc method="vmc" move="pbyp">
<parameter name="blocks"> 2 </parameter>
<parameter name="steps"> 40 </parameter>
<parameter name="timestep"> 50 </parameter>
</qmc>
</simulation>
To Reproduce Steps to reproduce the behavior:
- 2023-03-22 develop head
464286d39f207853a76da801301e708c4538b3a7 export CC=mpicc; export CXX=mpicxx; cmake -D QMC_COMPLEX=1mpirun -np 2 qmcpack vmc.xml
Expected behavior
Run terminates without error. qmc.s000.traces.h5 is filled with correct run information.
System:
- Xeon(R) Gold 6234
gcc/10.4.0 openmpi/4.0.7 intel-oneapi-mkl/2023.0.0 boost/1.80.0 cmake/3.25.1 hdf5/mpi-1.10.9
BTW, I get seg. fault with 1 mpi 2 threads
export OMP_NUM_THREADS=2; mpirun -np 1 qmcpack vmc.xml
Does it work without tracemanager? (Didn't see this was active in your earlier communicated report) Has it ever worked? Does vmc_batch work without tracemanager?
Hopefully this is a trivial fix but note that depending on the depth of breakage this could be a wontfix since it is in the legacy code. It might be a better use of time to add freshly designed capability to the modern code.
Without the TraceManager, the input runs fine using legacy or batch drivers in pure MPI, pure OpenMP, and mixed runs.
Since the TraceManager works fine in a serial run, I'm hoping this is a simple fix. I just lack the tools to efficiently debug a parallel run.
Using good-ole print statements, I followed the execution to VMC::run Traces->startRun(nBlocks, traceClones);, which does
inline void startRun(int blocks, std::vector<TraceManager*>& clones)
{
if (master_copy)
{
initialize_traces();
check_clones(clones);
open_file(clones);
}