openml-python No runtime information available through list

Hi there, I am trying to analyze cpu runtimes of my runs, but I it seems like they are not available on openml.

# get HGB evals
evals_hgb = openml.evaluations.list_evaluations('runtime', 
                                            uploader = [8323], 
                                            flow=[12736], 
                                            output_format='dataframe')
evals_hgb

_

This is not unique to my runs. I also tried to access runtimes directly though run objects, but they are not saved there either. I doubt that there is something that can be done after the fact, but is there a way to ensure that runtimes are captured for future runs?

@amueller

Aug 25 '19 18:08 hp2500

cc @mfeurer ?

Aug 26 '19 15:08 amueller

I just check the most recent run (https://www.openml.org/r/10398294) and it has runtime information in the XML information: https://www.openml.org/api/v1/run/10398294

I assume that 'runtime' is the wrong key here. However, it appears that there is yet another issue regarding the information served. Executing the following script:

import openml

evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis',
                                            uploader = [8323],
                                            flow=[12736],
                                            output_format='dataframe')
print(evals_hgb.shape)

evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis_testing', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis_training', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis_testing', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis_training', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)

Results in

(0, 0)
(100, 12)
(100, 12)
(100, 12)
(0, 0)
(0, 0)
(0, 0)

despite the XML mentioned above has all the available keys.

Sep 04 '19 08:09 mfeurer

Can you try to get the information via the runs for now?

Sep 04 '19 08:09 mfeurer

@mfeurer Thanks for your response. Sorry, I am not sure why I used 'runtime' as a key in my example, in my script I actually tried it with 'run_cpu_time', which I got from openml.evaluations.list_evaluation_measures().

That being said, I tried your code with 'usercpu_time_millis', which seems to work for some of the previous openml runs, but I am still not getting runtimes for the runs that I created.

Sep 04 '19 14:09 hp2500

Okay, I check a bit more and this appears to be an API issue. When calling https://openml.org/api/v1/xml/evaluation/list/task/7592/flow/16374/ and https://openml.org/api/v1/xml/evaluation/list/task/7592/flow/16374/function/usercpu_time_millis we don't get any results. However, there are clearly results available at https://www.openml.org/api/v1/run/10398294. @janvanrijn @joaquinvanschoren @sahithyaravi1493

Sep 13 '19 12:09 mfeurer

@janvanrijn says the reason it's not there is that the times are currently on a fold basis, and he wasn't sure if having the mean makes sense. I think it does and we just agreed that adding the mean here would be good.

Oct 14 '19 11:10 amueller

In the meantime, you can get the information via

import openml
openml.runs.get_run(10398294).fold_evaluations['usercpu_time_millis_training'][0]

That gets you per-fold time (same exists for wallclock)

Oct 16 '19 07:10 amueller

Also, it looks like the R interface uploads this: https://www.openml.org/t/31

This raises the question whether this should be done in the language bindings or the backend. If we do it in the backend we could fill it in for all our runs.

Oct 16 '19 09:10 amueller

Actually, thinking about it now, it might be easier to do this on the openml-python side if R is already doing it. We can always get the old data from the folds and compute it ourselves.

There's currently no support for evaluations that are not per fold, so this requires a couple of lines of code but shouldn't be hard.

Oct 16 '19 11:10 amueller

openml-python
openml-python copied to clipboard

No runtime information available through list_evaluations()

openml-python openml-python copied to clipboard

No runtime information available through list_evaluations()

openml-python
openml-python copied to clipboard