openml-python
openml-python copied to clipboard
suggestion for dataset upload: check for name conflicts
Description
When submitting a new dataset with the same name as an existing dataset, the OpenML REST API will automatically store it as a new version of that dataset. It would seem nice to do confirm this with the user. The process would be:
- Python API checks if a dataset with that name exists
- If so, it asks the user whether she wants to store the dataset as a new version or rename the dataset
This is also the procedure for other platforms (e.g., Google Drive)
Is anything like this possible? Maybe fail by default and add an option 'allow_versioning'? Or something less intrusive?
Having a check makes sense. On the one hand, I think it makes more sense to have server-side support for this, i.e., the server itself has an allow_versioning
parameter in the REST call. openml-python
then uses that. This makes the behavior consistent between different openml connectors. It will then also allow other reasons (e.g., checks based on meta-features, description similarity) without each connector having to implement that logic.
On the other hand, the current way to upload then might require sending the whole dataset before having the check executed, which seems like a terrible user experience (potentially having to upload it twice). Maybe we should revisit a server-side check in V2?
Is anything like this possible?
Yes, you can get datasets by name in openml-python
(or in general, the list_datasets
functions allows you to check for this). Adding the check should take no more than a few lines of code.