openml-python icon indicating copy to clipboard operation
openml-python copied to clipboard

Error in updating dataset description

Open ihsaan-ullah opened this issue 2 years ago • 4 comments
trafficstars

Description

When a dataset description is updated, it gives the following error:

Screenshot 2022-12-26 at 12 46 00 PM

Steps/Code to Reproduce

# Import OpenML
import openml 

# Configure API Key
openml.config.apikey = 'API_KEY'

# Description file : loading from .md file
micro_markdown = "PLK.md"

# Dataset ID
dataset_id = 44238

# Download dataset without data
openml.datasets.get_dataset(dataset_id, False)


# Read MD file
f_micro = open(micro_markdown, "r")
micro_md = f_micro.readlines()
f_micro.close()

# update description on OpenML
openml.datasets.edit_dataset(dataset_id, description=micro_md)

ihsaan-ullah avatar Dec 26 '22 07:12 ihsaan-ullah

This is because the description field expects a string, and the provided value is a list of strings, try using the read function instead:

- micro_md = f_micro.readlines()
+ micro_md = f_micro.read()

This results in the text being uploaded, you can see a preview here: https://test.openml.org/d/20 If you are not satisfied with that result and want to experiment around, you can also use the test server:

import openml
openml.config.start_using_configuration_for_example()
openml.config.datasets.edit_dataset(20, description=...)

PGijsbers avatar Jan 02 '23 13:01 PGijsbers

Thank you for the quick fix. It now leads to another issue:

Screenshot 2023-01-14 at 3 06 23 PM

Between these two screenshots the content of ".md" "file is displayed.

Screenshot 2023-01-14 at 3 06 52 PM

It looks like there is some encoding issue but not sure about it.

Descriptions are update for one dataset "PLK" but not working for the rest. https://www.openml.org/search?type=data&sort=runs&id=44238&status=active https://www.openml.org/search?type=data&sort=runs&id=44282&status=active https://www.openml.org/search?type=data&sort=runs&id=44317&status=active

ihsaan-ullah avatar Jan 14 '23 10:01 ihsaan-ullah

Between these two screenshots the content of ".md" "file is displayed.

Not sure what you mean with that. Can you provide the problematic markdown file(s)? Either per e-mail as files or

here in code block format

PGijsbers avatar Jan 16 '23 10:01 PGijsbers

Related to https://github.com/openml/OpenML/issues/911. In my opinion, we should take the following step:

  1. change the allowed characters that are legal in a data set description
  2. improve the error message (and update local verification)

PGijsbers avatar Feb 03 '23 11:02 PGijsbers