beast2 icon indicating copy to clipboard operation
beast2 copied to clipboard

Conflict of Optimized Relaxed Clock and Sampled Ancestors

Open Anaphory opened this issue 2 years ago • 12 comments

A sudden IndexError in an analysis had me stumped for some time. I have now figured out why it happens, but not how to fix it.

Symptoms:

For the analysis that permits sampled ancestors, the Relaxed Clock dies with an index error somewhere in the branch-rates-aware narrow exchange (MetaNER) operator. What happens is the following:

  1. The relaxed clock mechanism sets the dimension of the rate parameter to “number of nodes in the tree – 1”, and the number of nodes at that point is correctly “number of tips × 2 – 1”. For Sinotibetan, this is 98 rates and 99 nodes for 50 tips, with the nodes numbered 0 to 98.
  2. The model runs fine for several 100'000s of steps.
  3. The MetaNER tries to propose an exchange, but trying to look up the node D with number 98 it dies with an IndexError.

For a moment I thought I might have started with a sampled ancestor and then a Jump added an internal node which would put the total number of nodes, and therefore branches, and therefore rates needed, up by one without adjusting the dimension of the rates parameter. But that's not the case, sampled ancestors are not degree-2 nodes, but tips with a branch length of 0.

Investigating closer, I was greatly confused, because I know that node D in the MetaNER is a child, and that node 98 is always the root. Except:

https://github.com/CompEvol/beast2/blob/4d95f0095d20a6999d2757299c629bfbbf14b94b/src/beast/evolution/tree/Tree.java#L322-L333

This method messes with that invariant, and makes Sampled Ancestors and Optimized Relaxed Clocks incompatible. Why is it there? Why can Sampled Ancestors not just use the same setRoot() as everyone else? Is there a way to get SA to work with an ORC (I'm willing to modify and test code and supply patches, as you know), or am I stuck with the generic Relaxed Clock with its convergence issues? Should I cross-link this issue to the SA and ORC repositories?

Anaphory avatar Jul 19 '21 09:07 Anaphory

In my local BEAST sources, I have changed setRootOnly to be an alias of setRoot. Now my analysis seems to run with no issues, the trace looks appropriate, too. The difference in run-time is negligible, because the SA operators are not called that often and when they do, often they won't have changed the root, I guess. So I'm stumped why the SA operators got this special handling anyway. Does it have to do with the fact that in a sampled ancestor tree, the first bifurcation can be a descendant of an older sampled ancestor?

Anaphory avatar Jul 19 '21 14:07 Anaphory

I have made a commit in https://github.com/Anaphory/beast2/commit/9f10bc52da2e0fe26acf58199301e5d8539f6922 – I can make a pull request, but as it is, I have committed on top of https://github.com/CompEvol/beast2/pull/953 and some other small changes and somehow that PR is still in limbo.

Anaphory avatar Jul 19 '21 14:07 Anaphory

Running that patched version of BEAST on my data a while longer, I get IndexErrors in

https://github.com/CompEvol/beast2/blob/f6f325e3c7cb06853d34f2869854c1921aef1780/src/beast/math/distributions/MRCAPrior.java#L163

because

https://github.com/CompEvol/beast2/blob/f6f325e3c7cb06853d34f2869854c1921aef1780/src/beast/math/distributions/MRCAPrior.java#L180

ends up with a wrong cached NodeCount somehow.

Anaphory avatar Jul 20 '21 15:07 Anaphory

This in turn seems to happen because the SA-Operators (I found it in SAWilsonBalding, at least) first assign a new node as the root and only then add the remaining nodes as children to that new root:

        if (jP != null) {
            jP.removeChild(j);  // remove <jP, j>
            jP.addChild(iP);   // add <jP, iP>
            jP.makeDirty(Tree.IS_FILTHY);
        } else {
            iP.setParent(null); // completely remove <PiP, iP>
            tree.setRootOnly(iP);
        }
        iP.addChild(j);

I suggest the true solution is to modify the SampledAncestor operators, and I will link this issue from there; but to make it work for all analyses that do not have sampled ancestors and variable number of nodes in the tree by adding nodes during the analysis, it should be enough to change

https://github.com/CompEvol/beast2/blob/4d95f0095d20a6999d2757299c629bfbbf14b94b/src/beast/evolution/tree/Tree.java#L310

to

nodeCount = -1; 

and thus defer the re-calculation fo nodeCount to the next time it is needed. I checked the Tree class, and the only way to sidestep this and run into issues seems to be if that changed line 310 and

https://github.com/CompEvol/beast2/blob/4d95f0095d20a6999d2757299c629bfbbf14b94b/src/beast/evolution/tree/Tree.java#L1000-L1006

and

https://github.com/CompEvol/beast2/blob/4d95f0095d20a6999d2757299c629bfbbf14b94b/src/beast/evolution/tree/Tree.java#L251-L252

get called in that order, so it's already much safer than permitting SA operators to mess with a very general assumption (root is always the last node).

Anaphory avatar Jul 20 '21 15:07 Anaphory

@Anaphory Thanks for digging into this. Both the update of line 310 (i.e. set nodeCount = -1; ) and 9f10bc52da2e0fe26acf58199301e5d8539f6922 are required to fix this, right?

rbouckaert avatar Jul 20 '21 23:07 rbouckaert

They are both required, but they aren't enough yet: I also ran into

Fatal exception: null
java.lang.NegativeArraySizeException
	at beast.evolution.tree.Tree.store(Unknown Source)
	at beast.core.StateNode.startEditing(Unknown Source)
	at beast.evolution.tree.Tree.startEditing(Unknown Source)
	at beast.core.StateNode.getCurrentEditable(Unknown Source)
	at beast.core.Input.get(Unknown Source)
	at beast.evolution.operators.Exchange.proposal(Unknown Source)
	at beast.core.Operator.proposal(Unknown Source)
	at beast.core.MCMC.propagateState(Unknown Source)
	at beast.core.MCMC.doLoop(Unknown Source)
	at beast.core.MCMC.run(Unknown Source)
	at beast.app.BeastMCMC.run(Unknown Source)
	at beast.app.beastapp.BeastMain.<init>(Unknown Source)
	at beast.app.beastapp.BeastMain.main(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at beast.app.beastapp.BeastLauncher.run(Unknown Source)
	at beast.app.beastapp.BeastLauncher.main(Unknown Source)
Fatal exception: null
java.lang.RuntimeException: An error was encounted. Terminating BEAST
	at beast.app.util.ErrorLogHandler.publish(Unknown Source)
	at java.util.logging.Logger.log(Logger.java:738)
	at java.util.logging.Logger.doLog(Logger.java:765)
	at java.util.logging.Logger.log(Logger.java:788)
	at java.util.logging.Logger.severe(Logger.java:1464)
	at beast.app.beastapp.BeastMain.<init>(Unknown Source)
	at beast.app.beastapp.BeastMain.main(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at beast.app.beastapp.BeastLauncher.run(Unknown Source)
	at beast.app.beastapp.BeastLauncher.main(Unknown Source)

which is somewhat understandable: If for some reason the SAWilsonBalding sets the nodeCount to -1 and the tree needs to be stored immediately after the operator is applied, without any random call to getNodeCount() before that. Because that methot is memoized, replacing

https://github.com/CompEvol/beast2/blob/6859490f0baf0468ca1e538ec04a388bbe1d427d/src/beast/evolution/tree/Tree.java#L803-L804 with

     if (m_storedNodes.length != getNodeCount()) {
         final Node[] tmp = new Node[nodeCount]; 

should fix that problem. However, that whole if looks like it comes from the times when sampled ancestors were non-bifurcating nodes, so maybe it would be best to replace it as a whole and just run (committed as https://github.com/Anaphory/beast2/commit/8526c99d6f22d6ad67b17afd31920a1735425d37)

/**
 * StateNode implementation *
 */
@Override
protected void store() {
    storeNodes(0, getNodeCount());
    storedRoot = m_storedNodes[root.getNr()];
}

Anaphory avatar Jul 21 '21 08:07 Anaphory

What if getNodeCount is called at the end of setRoot?

rbouckaert avatar Jul 21 '21 19:07 rbouckaert

The issue with the SAWilsonBalding operator is that it constructs a partial tree, set that as the root, and only then glues in the other part of the tree, so getNodeCount will generate the wrong result. The -1 is for “Don't forget to re-compute this number once the SAWilsonBalding is completely finished”.

Anaphory avatar Jul 21 '21 20:07 Anaphory

So, if the SAWilsonBalding operator would call setRoot with the correct root at the end of the proposal, that would do the trick, right? Perhaps adding some comments to setRootOnly to that effect might be useful. I see how using the -1 flag works, but it does not feel particularly robust.

rbouckaert avatar Jul 22 '21 20:07 rbouckaert

So, if the SAWilsonBalding operator would call setRoot with the correct root at the end of the proposal, that would do the trick, right?

It should. I have put it, and two other things that I found useful, into PR#19 over there.

Perhaps adding some comments to setRootOnly to that effect might be useful.

I would (and that's part of my PR) mark setRootOnly as deprecated and an alias of setRoot.

I see references to the ‘news’ that sampled ancestor nodes are tips with branch length zero, and I suspect that many of these corner cases are due to workarounds from a time when they were not like generic nodes yet? If that's the case, then getting rid of these hacks should be fine.

I see how using the -1 flag works, but it does not feel particularly robust.

Oh, it really is not. I already ran into it being a problem somewhere else: In rare cases, a Tree.store() was called immediately after setRoot (I don't know why there was no likelihood calculation in beetween which would have re-computed the nodeCount, but it happened) and that creates an array with negative length in the old line 800 of that method. And I'm sure there are other issues.

Getting rid of setRootOnly as much as possible and re-calculating nodeCount inside setRoot sounds like the much better solution.

Anaphory avatar Jul 22 '21 21:07 Anaphory

I see how using the -1 flag works, but it does not feel particularly robust.

Oh, it really is not. I already ran into it being a problem somewhere else: In rare cases, a Tree.store() was called immediately after setRoot (I don't know why there was no likelihood calculation in beetween which would have re-computed the nodeCount, but it happened) and that creates an array with negative length in the old line 800 of that method. And I'm sure there are other issues.

Like, as I just noticed, resuming. :disappointed: Which is really important. So I'll undo that change and end with nodeCount = getNodeCount(); while using https://github.com/CompEvol/sampled-ancestors/pull/19 (which fixes it on the SA end) for my analysis.

Anaphory avatar Jul 23 '21 10:07 Anaphory