provide feedback on `max_iter` for `extend_haplotypes`
After #2938 goes in, we need to do something so that the user knows whether the algorithm terminated because of running out of things to do or because it ran into max_iter. And, give guidance in the docs about what max_iter should be. I propose throwing a warning if max_iter is reached; we need to do some experiments to determine a good suggested value.
Return a dataclass consisting of the tree sequence and algorithm convergence details? Relying on side channels for info like this is tricky, I'd suggest something like
@dataclasses.dataclass
class ExtendHaplotypesResult:
tree_sequence: TreeSequence
iterations: int
@hfr1tz3 has done some experiments here:
- doing 1e4 samples on 50Mb was taking like 10min per iteration
- nearly all the edges are added in the first iteration - like 99% on first iteration; 1% on second iteration; just a handful (like 3) in remaining iterations
- it always terminates within 5 iterations
- this is confirmed on many more reps of fewer samples
Proposal is to (a) set default to 10; and say in docstring that if speed is a concern, setting to 1 or 2 should get nearly everything; and it is possible though unlikely that it hasn't converged by 10 - re-running will verify.
So: no need for a dataclass.
Can you imagine this changing in future, e.g, using a slightly different algorithm? What if we returned those numbers of edges per iteration in the dataclass? It worry that people won't get these nuances unless the data is readily available in the result
I can imagine it, but I don't think it's terribly likely? If we did want to do that, we'd probably give the method a new name and deprecate the old one? I'm proposing this because the numbers seem very clear that users can ignore this detail 98% of the time, and the note about speeding it up by setting max_iter=1 will get the remaining 2%. So, I'd rather keep this as a simple "ts tranformation" method, like the others, that returns a modified ts.
Ah ok, I forgot this had been released and documented already. Ignore what I said then
Closing for inactivity and labelling "future", please re-open if you plan to work on this.