ludwig
ludwig copied to clipboard
Get trial status from outside the program
Is your feature request related to a problem? Please describe.
Yes. I would like to know how many trials are pending/running/finished. How can I get the status from external?
Describe the use case
As a user, once I kickoff the job, I want to understand the progress of the program.
Describe the solution you'd like ludwig is more like a SDK, seems it's very hard to get program status from external? If there's no elegant way to make it, could we update a file so external program can read the file to get latest updates?
Describe alternatives you've considered N/A
Additional context N/A
Hi @Jeffwan, great question!
The most straightforward way to get hyperopt trial status would be the experiment_state-<datetime>.json file that is created inside the results logdir. This file contains a list of checkpoints, where each checkpoint is a dictionary that contains information about a trial. You can take a look at the status key for each dictionary, and the checkpoints list overall, to query the status of the hyperopt experiment. Would that be sufficient for unblocking your use case?
Another way to do this programatically would be using Ray Tune Callbacks, where we update the trial status each time the callback is used.
@ShreyaR
I feel it's good to write status in file but it's hard to check status real time. Currently, I copy all files to remote once it's done. If I provide a service to check trial status, that means my backend service needs to find the file and parse it to get status which doesn't seem very promising. But I think it's still helpful to understand the statistic. For example, how many trials failed etc