qmcpack Useful walker ids

Proposed changes

Walker ID's in batched version of the drivers should now supply trackable walker ID's

Practically this looks like after a load balance is done: 44: Test command: /usr/local/bin/mpiexec "-n" "4" "--oversubscribe" "/Users/Shared/ornldev/code/qmcpack/build_new/src/QMCDrivers/tests/test_new_drivers_mpi"

from the "MPI WalkerControl population swap walkers" test case

Section ("Load Balance") 44: count_before: { 4, 1, 1, 1, } count_after: { 1, 2, 2, 2, } 44: Walkers Per Rank (Total: 7) 44: rank: 0 walker ids: { 1, } parent ids: { 0, } 44: rank: 1 walker ids: { 2, 6, } parent ids: { 0, 1, } 44: rank: 3 walker ids: { 4, 8, } parent ids: { 0, 1, } 44: rank: 2 walker ids: { 3, 7, } parent ids: { 0, 1, }

Section("Load Balance Multiple Copy Optimization") 44: count_before: { 12, 1, 1, 1, } count_after: { 3, 4, 4, 4, } 44: Walkers Per Rank (Total: 15) 44: rank: 0 walker ids: { 1, 5, 9, } parent ids: { 0, 1, 1, } 44: rank: 1 walker ids: { 2, 6, 10, 14, } parent ids: { 0, 1, 6, 6, } 44: rank: 2 walker ids: { 3, 7, 11, 15, } parent ids: { 0, 1, 7, 7, } 44: rank: 3 walker ids: { 4, 8, 12, 16, } parent ids: { 0, 1, 8, 8, }

What type(s) of changes does this code introduce?

New feature
Testing changes (e.g. new unit/integration/performance tests)
Documentation or build script changes

Does this introduce a breaking change?

No

What systems has this change been tested on?

OSX laptop

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is simply a reminder of what we are going to look for before merging your code.

Yes This PR is up to date with current the current state of 'develop'
Yes Code added or changed in the PR has been clang-formatted
Yes This PR adds tests to cover any new code, or to catch a bug that is being fixed
Yes Documentation has been added (if appropriate)

May 28 '24 21:05 PDoakORNL

Drive-by comment: I think there are only two requirements to satisfy here (i) scheme is performant [e.g. minimal comms] (ii) walkers IDs are unique. If it helps -- and I expect it does -- i think walker IDs can be completely arbitrary numbers. Rationale: Postmortem analysis will be done by automation. Do we have any other actual requirements vs nice to have properties (e.g., sequential, increasing)?

May 28 '24 22:05 prckent

If we use UUID's (sorry I they are only called GUIDs in windows.) We would need no state, but without post processing it would be basically human unreadable.

May 29 '24 15:05 PDoakORNL

Non-drive by comment, now that I have thought about this a bit more: since walkers are created at the mpi rank (vs crowd) level, we can count modulo num_ranks and offset the counters by the rank. The only state needed is the number of walkers created by each rank, and this never has to be communicated. This is very close to what is implemented here, but I don't understand the discussion of the "golden walker". What is meant by that? Seems unnecessary?

Two rank, initial four walker example:

rank 0: walkers 0,2 rank 1: walkers 1,3

rank 1 creates an extra walker:

rank 0: walkers 0,2 rank 1: walkers 1,3,5

rank 0 kills walker 2, and we load balance:

rank 0: walkers 0,5 rank 1: walkers 1,3

rank 0 creates an extra walker:

rank 0: walkers 0,5,4 rank 1: walkers 1,3

May 29 '24 15:05 prckent

Here is my expectation of walker ID. It represents the location(rank, index within the rank) of walker in each step. It is not carried over steps. At initialization or after branching, walker ids are reassigned and the old value is kept as parent walker id. Both current and parent ids should allow us to reconstruct walker traces.

The only state needed is the number of walkers created by each rank, and this never has to be communicated.

I don't get why this is even needed in the current scheme. Since walkers are created and annihilated from step to step, I don't think number of walkers created by each rank really means anything.

If the intention is making the walker id never change after creation. I would like to see how branching and trace reconstruction is managed. I don't mean implementation but some description before adopting this idea.

Regarding golden walkers. Walkers have configurations (electron positions) and contexts (particleset, TWF, Ham). During the initialization before hitting QMC drivers, a full context is built and later used to cloning. That is a called a golden walker. To be accurate, it should be called golden walker context. Only driver cloned contexts are used for calculation and thus the gold walker context is not affected by any drivers being created and destroyed.

  MCPopulation(int num_ranks,
               int this_rank,
               ParticleSet* elecs,
               TrialWaveFunction* trial_wf,
               QMCHamiltonian* hamiltonian_);

Treat elecs, trial_wf, hamiltonian as reference as they are owned by particleset, twf, ham pools. For this reason, the golden walkers are not members of a population.

May 29 '24 17:05 ye-luo

I think we need to discuss the envisioned use cases for this. What myself, Jaron, and others are after is indeed "making the walker ID never change after creation", so that we can e.g. follow the history of presumed "bad" walkers. With a walker's unique ID plus that of its parent (which it would have to store), we could do that. IDs would never be reused so the scripting would be easy. I believe this is what the legacy code attempts/attempted to do as well, not that it been carefully verified in the modern era.

Do you or anyone else have other uses in mind?

May 29 '24 18:05 prckent

To follow the bad walker, you will need to "backtrace". If IDs are fully unique, you will need to search the parent id in the population at every step which is very bad. If following my ID scheme, ID is an index and there is no need of searching.

May 29 '24 18:05 ye-luo

Maybe some crossed language here: once created, walkers have fixed ID independent of timestep. So if a run aborts due to population explosion (most common case), we just follow the low energy walker back to when it was moved to the bad position (assuming we were logging output). This is usually a walker with a single ID. Perhaps your index == what I was describing as a fixed ID? See my example a few comments back.

May 29 '24 20:05 prckent

Maybe some crossed language here: once created, walkers have fixed ID independent of timestep. So if a run aborts due to population explosion (most common case), we just follow the low energy walker back to when it was moved to the bad position (assuming we were logging output). This is usually a walker with a single ID. Perhaps your index == what I was describing as a fixed ID? See my example a few comments back.

In my indexing way, the index is only valid for the current step, the parent index is valid for the previous step. Since it is already index, look up is O(1).

In your fixed ID way, the ID is unique but look up is painful. At step N, walker with id=5 goes bad and its parent id is X, we backtrace to the step N-1, it has 100 walkers, due to unsorted walker ID, need to search all the 100 walkers to find X in order to read info like coordinates/energy/weight. How about now we have 10^6 walkers instead of 100 ... Looking up takes O(N).

May 29 '24 20:05 ye-luo

When using my index(parent index). Step 0. rank 0: walkers 0(0),2(2) rank 1: walkers 1(1),3(3) step 1. rank 0: walkers 0(0),2(2) rank 1: walkers 1(1),3(3),5(1) # both 1 and 5 are branched from walker 1 of step 0. step 2. rank 0: walkers 0(0),2(5) # walker 2 is killed and replaced by branching off walker 5 of step 1. rank 1: walkers 1(3),3(5) # walker 1 of step 1 is killed and load balancing shuffled walkers.

@prckent Please also clarify what is parent walker ID in your scheme and how to handle killing/branching.

May 29 '24 21:05 ye-luo

How about now we have 10^6 walkers instead of 100 ... Looking up takes O(N).

The only envisaged use I have for this facility is offline debugging and maybe occasional online debugging. There is zero cost to everyday simulations beyond moving a few extra bytes and one extra piece of state. See what I put in the requirements above. What are you envisaging using this for?

May 31 '24 18:05 prckent

I'm thinking about a fast way of backtracing a bad walker and accessing infos like coordinates and energy on the evolution path in the analysis phase. This analysis is usually done offline.

Walker id is the only way of tracking walkers. The walker id from creation is not enough to backtrace a walker to step 0 due to branching and thus the parent walker id is also needed. That is why I'm wondering with your labeling scheme how to reconstruct a backtrace.

Particularly, I need this backtrace construction fast and not disk I/O and memory intensive. For a walker at step N with a parent walker id X, i need an efficient way to find X at step N-1 and read out coordinates and energy.

preferably read only partial data from disk and work only with this partial data. Reading all the walkers of all the MPI ranks of the whole history from disk and storing in the memory is bad. Preferably data can be read per step or a step block.
With the partially read data, looking up still needs to be fast. If there are a million record in a step block, a skip forward method should be used to locate X. Linear searching X is bad.

May 31 '24 18:05 ye-luo

Indeed, I intend for both walker and parent ID's to be immutable and carried with a walker upon/following its creation.

Using hash tables in post-processing reduces search times to O(1) following an O(N) construction.

The walker tracing functionality (#5019) naturally organizes the walker data by rank (one file per rank), obviating the need to track it explicitly. As Ye and I have discussed offline, a slight generalization of the current functionality could optionally segment the outputted buffer data by MC block to further reduce the construction cost if data only from particular blocks is desired.

I definitely prefer using a simple strided indexing formula like the one currently in nextWalkerID() over anything more complicated or obscure.

Jun 04 '24 06:06 jtkrogel

Here is the issue of immutable walker id. If walker id is based on the creation MPI rank not the actual rank were the walker locates, looking up the walker requires loading information of all the walkers of a given step. The walker tracing functionality (https://github.com/QMCPACK/qmcpack/pull/5019) is already quite unfriendly to I/O due to its intensive amount of data being written. To construct backtrace, all the trace data from every MPI rank has to be read from disk and stored in memory although this can be improved by blocking as datasets in the trace files. The cost of creating hash map for the walker id and record location for each step is obviously not O(1).

Jun 04 '24 14:06 ye-luo

Examining in more detail how the driver code actually sequences the transfers and copies I realized we can in fact do all the necessary walker spawning so that all walker ID's are rank local (see below for definition of that), parent ids will allow tracking the originating walker whether it be on or off rank. See the new walker control tests for how this works in more detail. Much more of the walker control is now actually unit tested as well. I'd still consider coverage to be pretty incomplete as the tests are easy cases and only single stage, but I do think they nail down the expected behavior more clearly than in the past.

It seems logical to me to have "parentless" walkers have parent id == 0 i.e. the null walker. These are ID's and not array indexes (pointer offsets).

From this it follows that no walkers ID should be 0 unless it is a blank not yet properly initialized. Something which happens in legacy code i.e. locations where walkers are constructed outside of MCPopulation. At least until we refactor WalkerConfiguration, Walker and Particle set there will still be walkers that go through a two stage creation where the invalid for a time.

Jun 04 '24 21:06 PDoakORNL

@PDoakORNL could you clarify "rank id's are local"? What is rank id you were referring? My understanding is that walker id is defined in each step and thus meaningful for that step only. At a given step, each walker on every MPI rank has a unique id. parent id is for the previous step. This allows tracing a walker back to the previous step. Is that what you meant?

There are two ways of indexing walkers at a given step.

contiguously indexing. rank 0 (walker 0, walker 1...) rank 1(walker 0, walker 1...)
more or less the scheme used in the current PR. non contiguously indexing. gid = Np * local_index + RankID. NP is the number of MPI ranks, local_index is the index of a walker on its own MPI rank.

Method 1 eliminates the info about ranks. It is more flexible if rank info is not critical. It can also be more difficult if rank info is needed. Method 2 requires to know the number of walkers on each rank but more friendly to handling data by ranks. I'm open to both schemes defining walker IDs.

I don't quite follow you mention of ""parentless" walkers" and also have difficulty to understand your two stage comments.

Here is my understanding of the code. WalkerConfiguration carries Walkers not ParticleSet. It carries data cross QMC drivers. MCPopulation is a driver concept. It captures a WalkerConfiguration to start a walker population. With a population created, walker id makes sense. Once a QMC driver ends, the population is technically gone and walker id is not meaningful anymore.

so in a driver

create a population based on loaded WalkerConfiguration
assign initial walker id. parallel id is basically undefined.
loop step
   random walking
   branching.
   parent id = walker id. Reassign walker id
end loop step

Jun 04 '24 23:06 ye-luo

I mean that on a rank all walkers id's will be walker_id % (num_ranks) = rank + 1. I'll edit that to make it more clear.

I believe in practice walkers should not be getting moved more than necessary so the ID's should be presevered between steps unless the walker actually changed rank or unless it is a copy from another walker on the rank in which case it gets a new id. In practice I don't think a large % of walker die, are born, or transferred each step.

IDs are never reused and are not based on local index only on number of new walkers born on the rank. There are no gaps if you look at the sequence of particular ranks IDs with integer division by number of ranks.

IDs should be unique for a particular walker and there cannot be based on the local walker index

My understanding of mw_saveWalker ... mw_loadWalker was that it was not supposed to just treat the walkers and walker elements as just slots that could have a completely new identity every step as we serialize in and out of some insane untyped byte buffer.

But that rather a walker than had not been killed would be a continuation of a particular walker trajectory. Its hard to be certain since this whole save load thing moving state between walkers and particle sets is a little... 😭 But I have always intended that a indexes of walker elements will never be mixed. I.e. Walker[1], ParticleSet[1], TWF[1], Hamiltonian[1]

Jun 05 '24 00:06 PDoakORNL

Here is an example of the bubble I was referring to. When starting 3 walkers on rank 0, 1 and 2 walkers on rank 2, 3 for total request 10 walkers. Rank 0 ids: 0, 4, 8 Rank 1 ids: 1, 5, 9 Rank 2 ids: 2, 6 Rank 3 ids: 3, 7 at this point there is no gap. Now rank 3 creates a walker during branching. Rank 3 ids: 3, 7, 11 There is no walker with id 10.

Jun 05 '24 00:06 ye-luo

If the walker id is only intended to generate a unique id, then the scheme in this PR is ok. However, it cannot be used to be more informative indicating MPI rank or index. As long as killed/born/migrating do happen, such scenarios should be taken into account. I don't think how large % of walker affects our criteria for choosing a scheme for id.

Jun 05 '24 00:06 ye-luo

I believe in practice walkers should not be getting moved more than necessary so the ID's should be presevered between steps unless the walker actually changed rank or unless it is a copy from another walker on the rank in which case it gets a new id.

Now @PDoakORNL is saying walker id is mutable. @prckent @jtkrogel @PDoakORNL we need a discussion.

Jun 05 '24 00:06 ye-luo

I should also add that any time I'm think about producing massive amounts of data I consider how much work getting it into a relational database would be. That is after all "what you do" if working with data that won't fit in memory that you want to make structured queries against.

Id use tables like: Walker

WalkerKey (Primary Key) Block Walker Id Parent ID

Step

stepkey (primary key) walker index step block kinetic energy potential energy ...

Conveniently if you limited your database to just one block per instance you could just user walker ID as the key to the first table and step+walkerID as the key second table.

To track a particular walker select all steps for a particular walker ID say (12). Then I'd select all walkers IDs with a parent ID (12). Then query each of those IDs and their steps. You would continue this process until you got no more walkerID's connected to your original ID. Merge all the resulting rows together you could plot a nice graph of the trajectory of some value with respect to the originating walker. If it every reached multiplicity high enough to get copied you would get multiple values per step otherwise you would just recover the trajectory even as it moved between ranks.

Conceptually you can accomplish the same thing with a scripting language just hammering on the output files but I'd rather have WalkerIDs that made processing those files into a DB easy. The issue of primary keys is basically the same as in memory hash.

Jun 05 '24 00:06 PDoakORNL

Walker ID is not mutable. But it is not portable to another rank.

Jun 05 '24 00:06 PDoakORNL

Walker ID is not mutable. I walker that migrates is killed and born again.

Unfortunately killing a physical term not a technical term. Migranting a walker cannot be treated as kill and born.

Jun 05 '24 01:06 ye-luo

From my perspective there are no longer any gaps.

The walker IDs come from a N = num_ranks disjoint sequences from the positive integers with each value x = {0,..., num_ranks-1} determines a sequence of walker ids such that members are those integers such that w_id -> P_INT % num_rankss == x. Each rank uses the set of IDs such the x defining the set has x == rank. on any particular rank All ID's are used sequentially for active walkers. An active walker is one that will on the next step attempt a move. A walker being overwritten must have either been freshly constructed or have been however briefly in an inactive state and receives the next rank ID.

Jun 06 '24 18:06 PDoakORNL

In the above example I was talking about gaps between ranks not within a rank. ids spaced by num_ranks were not considered disjoint.

Jun 06 '24 18:06 ye-luo

In your example the next time rank two creates a walker it will be 10.

Jun 06 '24 18:06 PDoakORNL

In your example the next time rank two creates a walker it will be 10.

At the current step, there is no walker 10. If the simulation ends here, there is no walker 10.

Jun 06 '24 18:06 ye-luo

Perhaps just post the pseudo-code and the ID's generated by 2-3 adjacent ranks?

Jun 07 '24 13:06 jtkrogel

Just looking at this after some time away...

I wasn't expecting this to involve studying all the population control, improving its testing and documentation, but we were going to have to do that sometime. Hopefully it will be easier to get fixed population working correctly in future :-)
Are we happy that with the current status we do actually have useful walker IDs?
Given the changes here I think it would be wise to do a real DMC run and sanity check nothing has been unintentionally changed. Has anyone done this as part of testing?

Jun 24 '24 19:06 prckent

I'm working on another rebase and forced push. This has been sitting for a week and new PR's getting merged. I'm not sure why anymore, but after this rebase I will not be keeping it current unless there is actual interest in merging it. I won't be fixing ID propagation between runs and not just between sections until this PR goes in, it has enough in it, tests what it adds and adds tests and documentation that should have always been there.

Jun 24 '24 19:06 PDoakORNL

The osx failures are also all with nexus and do not appear to have anything to do with this PR.

Jun 24 '24 19:06 PDoakORNL