custodian icon indicating copy to clipboard operation
custodian copied to clipboard

Discussion: `NonConvergingErrorHandler` and ALGO ladder

Open Andrew-S-Rosen opened this issue 3 months ago • 4 comments

Currently, when an ALGO = All run fails, Custodian switches to ALGO = Normal and starts playing around with the mixing parameters. Given the robustness of ALGO = All, I personally would be very surprised if this swap results in converged results that ALGO = All does not find. Empirically, I have not found a case yet.

@esoteric-ephemera: I am curious. Do you have much experience or input on this?

https://github.com/materialsproject/custodian/blob/32858c31a8300233ae91a0ad1429a00aa8a333fd/src/custodian/vasp/handlers.py#L1689-L1705

Andrew-S-Rosen avatar Sep 19 '25 23:09 Andrew-S-Rosen

Yeah that's a pretty standard approach - sometimes Kerker mixing throws off the density mixer and you get oscillatory convergence behavior. Happens more often with systems with vacuum, especially atoms, molecules, clusters, maybe surfaces to a degree

esoteric-ephemera avatar Sep 19 '25 23:09 esoteric-ephemera

@esoteric-ephemera --- I suppose the bigger question I have is the following: is it worthwhile to be changing ALGO = All to Normal with modified mixing parameters, or if ALGO = All fails should the job simply stop?

Andrew-S-Rosen avatar Sep 20 '25 00:09 Andrew-S-Rosen

Keeping ALGO = ALL but with linear mixing is probably a saner choice, would be in favor of that

To your point though: once conjugate gradient fails, you're kinda in the realm of manually adjusting the calculation to get it to run. Whether it's a good solution in high throughput that doesn't just waste compute, we'd have to collect some statistics on custodian jobs that ran and hope that they encountered this error

esoteric-ephemera avatar Sep 23 '25 15:09 esoteric-ephemera

To your point though: once conjugate gradient fails, you're kinda in the realm of manually adjusting the calculation to get it to run. Whether it's a good solution in high throughput that doesn't just waste compute, we'd have to collect some statistics on custodian jobs that ran and hope that they encountered this error

Thanks, yeah that's a good point and largely what I had been thinking. If ALGO = All fails, I think all bets are off with regards to a one-size-fits-all solution working out. But of course, I don't have data on this.

If anyone reading this one day is feeling adventurous, I'd love to see some statistics so we can plan accordingly.

Andrew-S-Rosen avatar Sep 23 '25 15:09 Andrew-S-Rosen