openml-python
openml-python copied to clipboard
No runtime information available through list_evaluations()
Hi there, I am trying to analyze cpu runtimes of my runs, but I it seems like they are not available on openml.
# get HGB evals
evals_hgb = openml.evaluations.list_evaluations('runtime',
uploader = [8323],
flow=[12736],
output_format='dataframe')
evals_hgb
_
This is not unique to my runs. I also tried to access runtimes directly though run objects, but they are not saved there either. I doubt that there is something that can be done after the fact, but is there a way to ensure that runtimes are captured for future runs?
@amueller
cc @mfeurer ?
I just check the most recent run (https://www.openml.org/r/10398294) and it has runtime information in the XML information: https://www.openml.org/api/v1/run/10398294
I assume that 'runtime'
is the wrong key here. However, it appears that there is yet another issue regarding the information served. Executing the following script:
import openml
evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis',
uploader = [8323],
flow=[12736],
output_format='dataframe')
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis', output_format='dataframe',
size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis_testing', output_format='dataframe',
size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis_training', output_format='dataframe',
size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis', output_format='dataframe',
size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis_testing', output_format='dataframe',
size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis_training', output_format='dataframe',
size=100)
print(evals_hgb.shape)
Results in
(0, 0)
(100, 12)
(100, 12)
(100, 12)
(0, 0)
(0, 0)
(0, 0)
despite the XML mentioned above has all the available keys.
Can you try to get the information via the runs for now?
@mfeurer Thanks for your response. Sorry, I am not sure why I used 'runtime' as a key in my example, in my script I actually tried it with 'run_cpu_time', which I got from openml.evaluations.list_evaluation_measures().
That being said, I tried your code with 'usercpu_time_millis', which seems to work for some of the previous openml runs, but I am still not getting runtimes for the runs that I created.
Okay, I check a bit more and this appears to be an API issue. When calling https://openml.org/api/v1/xml/evaluation/list/task/7592/flow/16374/
and https://openml.org/api/v1/xml/evaluation/list/task/7592/flow/16374/function/usercpu_time_millis
we don't get any results. However, there are clearly results available at https://www.openml.org/api/v1/run/10398294. @janvanrijn @joaquinvanschoren @sahithyaravi1493
@janvanrijn says the reason it's not there is that the times are currently on a fold basis, and he wasn't sure if having the mean makes sense. I think it does and we just agreed that adding the mean here would be good.
In the meantime, you can get the information via
import openml
openml.runs.get_run(10398294).fold_evaluations['usercpu_time_millis_training'][0]
That gets you per-fold time (same exists for wallclock)
Also, it looks like the R interface uploads this: https://www.openml.org/t/31
This raises the question whether this should be done in the language bindings or the backend. If we do it in the backend we could fill it in for all our runs.
Actually, thinking about it now, it might be easier to do this on the openml-python side if R is already doing it. We can always get the old data from the folds and compute it ourselves.
There's currently no support for evaluations that are not per fold, so this requires a couple of lines of code but shouldn't be hard.