ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

Provide estimates for overall remaining training time.

Open justinxzhao opened this issue 2 years ago • 0 comments

While a precise time to completion would be difficult to estimate as convergence depends on a mix of non-deterministic factors like early stopping criteria, there’s likely a reasonably good "max training time" estimate based on a function of steps per second, evaluation time, number of evaluations left, and number of training steps left.

More coarsely, since training and evaluation happen synchronously, total training time left could be estimated with:

(time to finish one training-checkpoint-eval cycle) * (number of rounds of evaluation)

One possible implementation would be to add additional time state information to LudwigModel, and maintaining this time state in callbacks.

justinxzhao avatar Jun 22 '22 00:06 justinxzhao