openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

No runtime information available through list_evaluations()

Open hp2500 opened this issue 5 years ago • 9 comments

Hi there, I am trying to analyze cpu runtimes of my runs, but I it seems like they are not available on openml.

# get HGB evals
evals_hgb = openml.evaluations.list_evaluations('runtime', 
                                            uploader = [8323], 
                                            flow=[12736], 
                                            output_format='dataframe')
evals_hgb

_

This is not unique to my runs. I also tried to access runtimes directly though run objects, but they are not saved there either. I doubt that there is something that can be done after the fact, but is there a way to ensure that runtimes are captured for future runs?

@amueller

hp2500 avatar Aug 25 '19 18:08 hp2500

cc @mfeurer ?

amueller avatar Aug 26 '19 15:08 amueller

I just check the most recent run (https://www.openml.org/r/10398294) and it has runtime information in the XML information: https://www.openml.org/api/v1/run/10398294

I assume that 'runtime' is the wrong key here. However, it appears that there is yet another issue regarding the information served. Executing the following script:

import openml

evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis',
                                            uploader = [8323],
                                            flow=[12736],
                                            output_format='dataframe')
print(evals_hgb.shape)

evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis_testing', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('usercpu_time_millis_training', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis_testing', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)
evals_hgb = openml.evaluations.list_evaluations('wall_clock_time_millis_training', output_format='dataframe',
                                                size=100)
print(evals_hgb.shape)

Results in

(0, 0)
(100, 12)
(100, 12)
(100, 12)
(0, 0)
(0, 0)
(0, 0)

despite the XML mentioned above has all the available keys.

mfeurer avatar Sep 04 '19 08:09 mfeurer

Can you try to get the information via the runs for now?

mfeurer avatar Sep 04 '19 08:09 mfeurer

@mfeurer Thanks for your response. Sorry, I am not sure why I used 'runtime' as a key in my example, in my script I actually tried it with 'run_cpu_time', which I got from openml.evaluations.list_evaluation_measures().

That being said, I tried your code with 'usercpu_time_millis', which seems to work for some of the previous openml runs, but I am still not getting runtimes for the runs that I created.

hp2500 avatar Sep 04 '19 14:09 hp2500

Okay, I check a bit more and this appears to be an API issue. When calling https://openml.org/api/v1/xml/evaluation/list/task/7592/flow/16374/ and https://openml.org/api/v1/xml/evaluation/list/task/7592/flow/16374/function/usercpu_time_millis we don't get any results. However, there are clearly results available at https://www.openml.org/api/v1/run/10398294. @janvanrijn @joaquinvanschoren @sahithyaravi1493

mfeurer avatar Sep 13 '19 12:09 mfeurer

@janvanrijn says the reason it's not there is that the times are currently on a fold basis, and he wasn't sure if having the mean makes sense. I think it does and we just agreed that adding the mean here would be good.

amueller avatar Oct 14 '19 11:10 amueller

In the meantime, you can get the information via

import openml
openml.runs.get_run(10398294).fold_evaluations['usercpu_time_millis_training'][0]

That gets you per-fold time (same exists for wallclock)

amueller avatar Oct 16 '19 07:10 amueller

Also, it looks like the R interface uploads this: https://www.openml.org/t/31

This raises the question whether this should be done in the language bindings or the backend. If we do it in the backend we could fill it in for all our runs.

amueller avatar Oct 16 '19 09:10 amueller

Actually, thinking about it now, it might be easier to do this on the openml-python side if R is already doing it. We can always get the old data from the folds and compute it ourselves.

There's currently no support for evaluations that are not per fold, so this requires a couple of lines of code but shouldn't be hard.

amueller avatar Oct 16 '19 11:10 amueller