openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

Incorrect estimation procedure ids

Open LaurensKrudde opened this issue 3 years ago • 0 comments

Description

The estimation_procedure_id does not always seem to correspond with the displayed estimation procedure. I came across this when reproducing tasks from existing datasets to new datasets.

Steps/Code to Reproduce

For example some tasks from the dataset 'credit-approval' with id 29:

import openml

task_df = openml.tasks.list_tasks(data_id=29, output_format='dataframe').iloc[:5]
print(task_df[['tid', 'estimation_procedure']])

print(openml.tasks.get_task(29).estimation_procedure_id)
print(openml.tasks.get_task(259).estimation_procedure_id)
print(openml.tasks.get_task(1793).estimation_procedure_id)
print(openml.tasks.get_task(88).estimation_procedure_id)
print(openml.tasks.get_task(1728).estimation_procedure_id)

gives:

       tid             estimation_procedure
29      29          10-fold Crossvalidation
88      88  10 times 10-fold Learning Curve
259    259                  33% Holdout set
1728  1728           10-fold Learning Curve
1793  1793   5 times 2-fold Crossvalidation

1
1
1
13
13

Expected Results

The first three should have estimation_procedure_id 1, 6 and 2. The first three should have estimation_procedure_id 3 and 13.

Actual Results

Actually the first three all have id 1. While the last two both have id 13.

Versions

Windows-10-10.0.19043-SP0 Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] NumPy 1.22.0 SciPy 1.8.0 Scikit-Learn 1.0.2 OpenML 0.12.2

LaurensKrudde avatar Jul 09 '22 21:07 LaurensKrudde