openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

ValueError: could not convert string to float: 'f'

Open learsi1911 opened this issue 4 years ago • 4 comments

Description

Hello! When I try to do an exercise on "Running and sharing benchmarks" I get the same error ("ValueError: could not convert string to float: 'f'") in both jupyter notebook and colab. Does the error mean that there is an error with the Key API?

The code is the next (sorry if this is a very basic openML question.):

Steps/Code to Reproduce

import openml
import sklearn

openml.config.apikey = 'API_KEY'  # set the OpenML Api Key
benchmark_suite = openml.study.get_suite('OpenML-CC18')  # obtain the benchmark suite

# build a scikit-learn classifier
clf = sklearn.pipeline.make_pipeline(sklearn.preprocessing.Imputer(),
                                     sklearn.tree.DecisionTreeClassifier())

for task_id in benchmark_suite.tasks:  # iterate over all tasks

    task = openml.tasks.get_task(task_id)  # download the OpenML task
    run = openml.runs.run_model_on_task(clf, task)  # run the classifier on the task
    score = run.get_metric_score(sklearn.metrics.accuracy_score)  # print accuracy score
    print('Data set: %s; Accuracy: %0.2f' % (task.get_dataset().name,score.mean()))
    run.publish()  # publish the experiment on OpenML (optional, requires internet and an API key)
    print('URL for run: %s/run/%d' %(openml.config.server,run.run_id))

Expected Results

Actual Results

Versions

learsi1911 avatar Jun 14 '21 13:06 learsi1911

No, this means that there are categorical features that cannot be handled by the current pipeline. Is this an example we use in the documentations? If yes, we need to update that to work with categorical attributes.

The example in this PDF should work here.

mfeurer avatar Jun 15 '21 06:06 mfeurer

That's right, it's an example of the documentation, here is the link: https://openml.github.io/docs/benchmark/

On Tue, Jun 15, 2021 at 8:45 AM Matthias Feurer @.***> wrote:

No, this means that there are categorical features that cannot be handled by the current pipeline. Is this an example we use in the documentations? If yes, we need to update that to work with categorical attributes.

The example in this PDF https://jmlr.org/papers/v22/19-920.html should work here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openml/openml-python/issues/1096#issuecomment-861224605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNP2MTUHZHKYZUIHCFF23LTS3ZKPANCNFSM46VJJLWA .

learsi1911 avatar Jun 15 '21 09:06 learsi1911

Alright, can you confirm that the other snippet works for you?

Thanks a lot for the link, so this page needs to be updated. Would you like to create a PR for doing so?

mfeurer avatar Jun 15 '21 10:06 mfeurer

Yes, sure. So that means that in the pipeline I need to use one method for categorical features and another for numerical features, right? I don't know if there is any way to find benchmarking with numerical features only.

On Tue, Jun 15, 2021 at 12:42 PM Matthias Feurer @.***> wrote:

Alright, can you confirm that the other snippet works for you?

Thanks a lot for the link, so this page https://github.com/openml/docs/blob/master/docs/benchmark.md needs to be updated. Would you like to create a PR for doing so?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openml/openml-python/issues/1096#issuecomment-861392824, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNP2MQUWFLMDAEO2NOOTXDTS4VCXANCNFSM46VJJLWA .

learsi1911 avatar Jun 15 '21 11:06 learsi1911