treetime icon indicating copy to clipboard operation
treetime copied to clipboard

convolve_fft: convert to delta function if distributions too narrow

Open anna-parker opened this issue 3 years ago • 3 comments

Issue

https://github.com/nextstrain/augur/issues/1057 The monkey pox build has branches where the node distributions are much smaller than the branch distributions and the fft fails as the grid is not coarse enough.

Changes

  • In these cases where the node distributions are much smaller than the branch distributions we should treat the node distribution as a delta function at the peak pos $T_n$ of that distribution and instead of using the fft to convolve the two distributions return the result of the branch length interpolator convolved with a delta function (this is for the inverse time case). $$C(t) = \int b(\tau) H(t- \tau) d\tau = C(t) = \int b(\tau) \delta([t - \tau] - T_n) d\tau =b(t - T_n)$$ we can also add the case if the branch length interpolator distribution is too small $$C(t) = \int b(\tau) H(t- \tau) d\tau = C(t) = \int \delta([\tau] - T_n) H(t- \tau) d\tau =H(t - T_n).$$

  • However, after implementing this I realized that the multiply function was continuously hitting the Warning:

                         "if you see this error \n please let us know by filling an issue at: https://github.com/neherlab/treetime/issues"

~~This happens when multiply receives two distributions as input which do not overlap. From closer analysis this seems to be due to the fact that branch length interpolators have x values starting from 0 and adding $T_n$ to the x values would result in the distributions only starting at $T_n$ exactly which is often too imprecise and leads to this error. Thus I add an exponential tail at the start of the convolution output whenever we convolve with a delta function.~~

This was due to the forward time case not being properly handled. For forward time the convolution becomes: $$C(t) = \int b(\tau) H(t+ \tau) d\tau = C(t) = \int b(\tau) \delta([t + \tau] - T_n) d\tau =b(T_n -t )$$

  • ~~I am also not sure if this should be in a different PR: the get_max_posterior_region function was failing because marginal_inverse_cdf was not calculated for all nodes (which is strange as I thought this was fixed) so I have added this in a separate commit.~~ I have added a check if marginal_inverse_cdf throws an error to handle delta functions.

Testing

I tested this on the monkeypox build that failed (output trees provided by Richard over slack), running the command:

augur refine             --tree results/hmpxv1_big/tree_fixed.nwk             --alignment results/hmpxv1_big/masked.fasta             --metadata results/hmpxv1_big/metadata.tsv             --output-tree results/hmpxv1_big/tree.nwk             --timetree             --use-fft             --root MK783032 MK783030             --precision 3             --keep-polytomies             --clock-rate 5.7e-05             --clock-std-dev 2e-5             --use-fft             --output-node-data results/hmpxv1_big/branch_lengths.json             --coalescent opt             --date-inference marginal             --date-confidence             --clock-filter-iqd 0

on augur master branch with the fix/narrow_peak_as_deltabranch installed for TreeTime. I used Cornelius' suggestion in the TreeTime run function and pickledself` for speedy debugging. i.e. I added the statement

 if os.path.exists('self.pickle'):
            with open('self.pickle', 'rb') as handle:
                self = pickle.load(handle)
            # determine how to reconstruct and sample sequences
            seq_kwargs = {"marginal_sequences":sequence_marginal or (self.branch_length_mode=='marginal'),
                        "branch_length_mode": self.branch_length_mode,
                        "sample_from_profile": "root",
                        "prune_short":kwargs.get("prune_short", True),
                        "reconstruct_tip_states":kwargs.get("reconstruct_tip_states", False)}
            time_marginal_method = reduce_time_marginal_argument(time_marginal) ## for backward compatibility
            tt_kwargs = {'clock_rate':fixed_clock_rate,
                        'time_marginal':False if time_marginal_method in ['never', 'only-final', 'confidence-only'] else True}
            tt_kwargs.update(kwargs)
        else:
             ##lines up to self.logger("###TreeTime.run: INITIAL ROUND",0)
              with open('self.pickle', 'wb') as handle:
                self.logger("###TreeTime.run: pickling",0)
                pickle.dump(self, handle, protocol=pickle.HIGHEST_PROTOCOL)

to the run() function in TreeTime.

anna-parker avatar Oct 04 '22 20:10 anna-parker

Does this work? Any extra testing that should be done? Or is this ready to merge :)

corneliusroemer avatar Oct 11 '22 22:10 corneliusroemer

Does this work? Any extra testing that should be done? Or is this ready to merge :)

I tested on the monkeypox builds (old failing build and newest big build)- they appear to work now. But would be great if @rneher could give the go ahead before we merge

anna-parker avatar Oct 12 '22 07:10 anna-parker

I'll give this a test run myself - don't have time to dig into the code but this helps make sure it's at least somewhat tested ;)

corneliusroemer avatar Oct 13 '22 17:10 corneliusroemer