openml-python
openml-python copied to clipboard
Incorrect estimation procedure ids
Description
The estimation_procedure_id does not always seem to correspond with the displayed estimation procedure. I came across this when reproducing tasks from existing datasets to new datasets.
Steps/Code to Reproduce
For example some tasks from the dataset 'credit-approval' with id 29:
import openml
task_df = openml.tasks.list_tasks(data_id=29, output_format='dataframe').iloc[:5]
print(task_df[['tid', 'estimation_procedure']])
print(openml.tasks.get_task(29).estimation_procedure_id)
print(openml.tasks.get_task(259).estimation_procedure_id)
print(openml.tasks.get_task(1793).estimation_procedure_id)
print(openml.tasks.get_task(88).estimation_procedure_id)
print(openml.tasks.get_task(1728).estimation_procedure_id)
gives:
tid estimation_procedure
29 29 10-fold Crossvalidation
88 88 10 times 10-fold Learning Curve
259 259 33% Holdout set
1728 1728 10-fold Learning Curve
1793 1793 5 times 2-fold Crossvalidation
1
1
1
13
13
Expected Results
The first three should have estimation_procedure_id 1, 6 and 2. The first three should have estimation_procedure_id 3 and 13.
Actual Results
Actually the first three all have id 1. While the last two both have id 13.
Versions
Windows-10-10.0.19043-SP0 Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] NumPy 1.22.0 SciPy 1.8.0 Scikit-Learn 1.0.2 OpenML 0.12.2