qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

TraceManager fails when run is not serial

Open Paul-St-Young opened this issue 2 years ago • 4 comments

Describe the bug When traces are active, only serial run finishes, otherwise the program crashes.

vmc.xml puts 1 up and 1 down electron into a box at rs=30 with the uniform wavefunction.

<simulation>
  <project id="qmc"/>
  <qmcsystem>

    <simulationcell>
      <parameter name="lattice" units="bohr">
    61.39960247678931 0.0 0.0
    0.0 61.39960247678931 0.0
    0.0 0.0 61.39960247678931
      </parameter>
      <parameter name="bconds">p p p</parameter>
    </simulationcell>

    <particleset name="e" random="yes">
      <group name="u" size="1">
      <parameter name="charge">-1</parameter>
    </group>
      <group name="d" size="1">
      <parameter name="charge">-1</parameter>
    </group>
    </particleset>

    <wavefunction name="psi0" target="e"/>

    <hamiltonian name="h0" type="generic" target="e">
      <pairpot name="ElecElec" type="coulomb" source="e" target="e"/>
    </hamiltonian>

  </qmcsystem>
  
  <traces array="yes" write="yes"/>
  
  <qmc method="vmc" move="pbyp">
    <parameter name="blocks"> 2 </parameter>
    <parameter name="steps"> 40 </parameter>
    <parameter name="timestep"> 50 </parameter>
  </qmc>
</simulation>

To Reproduce Steps to reproduce the behavior:

  1. 2023-03-22 develop head 464286d39f207853a76da801301e708c4538b3a7
  2. export CC=mpicc; export CXX=mpicxx; cmake -D QMC_COMPLEX=1
  3. mpirun -np 2 qmcpack vmc.xml

Expected behavior Run terminates without error. qmc.s000.traces.h5 is filled with correct run information.

System:

  • Xeon(R) Gold 6234
  • gcc/10.4.0 openmpi/4.0.7 intel-oneapi-mkl/2023.0.0 boost/1.80.0 cmake/3.25.1 hdf5/mpi-1.10.9

Paul-St-Young avatar Mar 23 '23 20:03 Paul-St-Young

BTW, I get seg. fault with 1 mpi 2 threads export OMP_NUM_THREADS=2; mpirun -np 1 qmcpack vmc.xml

Paul-St-Young avatar Mar 23 '23 20:03 Paul-St-Young

Does it work without tracemanager? (Didn't see this was active in your earlier communicated report) Has it ever worked? Does vmc_batch work without tracemanager?

Hopefully this is a trivial fix but note that depending on the depth of breakage this could be a wontfix since it is in the legacy code. It might be a better use of time to add freshly designed capability to the modern code.

prckent avatar Mar 23 '23 21:03 prckent

Without the TraceManager, the input runs fine using legacy or batch drivers in pure MPI, pure OpenMP, and mixed runs.

Paul-St-Young avatar Mar 23 '23 22:03 Paul-St-Young

Since the TraceManager works fine in a serial run, I'm hoping this is a simple fix. I just lack the tools to efficiently debug a parallel run.

Using good-ole print statements, I followed the execution to VMC::run Traces->startRun(nBlocks, traceClones);, which does

  inline void startRun(int blocks, std::vector<TraceManager*>& clones)
  {
    if (master_copy)
    {
      initialize_traces();
      check_clones(clones);
      open_file(clones);
    }

Paul-St-Young avatar Mar 23 '23 22:03 Paul-St-Young