clearml
clearml copied to clipboard
ClearML-Data:Could not load dataset state
Describe the bug
I keep running into this issue, where I want to set up a dataset and it ends up no longer being able to read the datset state. I created a new dataset using the basic CLI command, started adding data. Suddenly the CLI would only give me the following error:
Error: Could not load Dataset id=2d38ac74b2ad4a4495207e01b9dc277a state
This now happened 3 times already. I can now no longer delete the datasets and start over, as the delete command gives me the same error. When trying to delete the dataset through the web UI, I get this error:
Error 406 : Project has associated non-empty datasets (please delete all the dataset versions or use force=true): id=cdfa028c319a4dbcb593382c4e1de335
When trying to remove the dataset with the python API using force=True
as suggested, I get this:
2023-10-02 17:16:03,983 - clearml.Task - ERROR - Action failed <400/101: tasks.get_by_id/v1.0 (Invalid task id: id=cdfa028c319a4dbcb593382c4e1de335, company=9be5804ead8d45beac4ba3b9a3936117)> (task=cdfa028c319a4dbcb593382c4e1de335)
2023-10-02 17:16:03,983 - clearml.Task - ERROR - Failed reloading task cdfa028c319a4dbcb593382c4e1de335
2023-10-02 17:16:04,362 - clearml.Task - ERROR - Action failed <400/101: tasks.get_by_id/v1.0 (Invalid task id: id=cdfa028c319a4dbcb593382c4e1de335, company=9be5804ead8d45beac4ba3b9a3936117)> (task=cdfa028c319a4dbcb593382c4e1de335)
2023-10-02 17:16:04,362 - clearml.Task - ERROR - Failed reloading task cdfa028c319a4dbcb593382c4e1de335
2023-10-02 17:16:04,362 - clearml - WARNING - Could not get dataset with ID cdfa028c319a4dbcb593382c4e1de335: Task ID "cdfa028c319a4dbcb593382c4e1de335" could not be found
Can anyone tell me how to get out of this state?
To reproduce
I could try again and give the exact commands I use, but since this now happened to me 3 times, I'm not sure if they matter all that much...
Perhaps this helps - This is the stack trace I get when trying to do a Dataset.get() with the ID of one of the affected datasets:
In [3]: dataset = Dataset.get(dataset_id='2d38ac74b2ad4a4495207e01b9dc277a')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 dataset = Dataset.get(dataset_id='2d38ac74b2ad4a4495207e01b9dc277a')
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:1778, in Dataset.get(cls, dataset_id, dataset_project, dataset_name, dataset_tags, only_completed, only_published, include_archived, auto_create, writable_copy, dataset_version, alias, overridable, shallow_search, **kwargs)
1774 instance = Dataset.create(
1775 dataset_name=dataset_name, dataset_project=dataset_project, dataset_tags=dataset_tags
1776 )
1777 return finish_dataset_get(instance, instance._id)
-> 1778 instance = get_instance(dataset_id)
1779 # Now we have the requested dataset, but if we want a mutable copy instead, we create a new dataset with the
1780 # current one as its parent. So one can add files to it and finalize as a new version.
1781 if writable_copy:
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:1690, in Dataset.get.<locals>.get_instance(dataset_id_)
1682 local_state_file = StorageManager.get_local_copy(
1683 remote_url=task.artifacts[cls.__state_entry_name].url,
1684 cache_context=cls.__cache_context,
(...)
1687 force_download=force_download,
1688 )
1689 if not local_state_file:
-> 1690 raise ValueError("Could not load Dataset id={} state".format(task.id))
1691 else:
1692 # we could not find the serialized state, start empty
1693 local_state_file = {}
ValueError: Could not load Dataset id=2d38ac74b2ad4a4495207e01b9dc277a state
In [4]: dataset = Dataset.get(dataset_id='1cdc8407d0494adf822d282f7ad45739')
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 dataset = Dataset.get(dataset_id='1cdc8407d0494adf822d282f7ad45739')
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:1778, in Dataset.get(cls, dataset_id, dataset_project, dataset_name, dataset_tags, only_completed, only_published, include_archived, auto_create, writable_copy, dataset_version, alias, overridable, shallow_search, **kwargs)
1774 instance = Dataset.create(
1775 dataset_name=dataset_name, dataset_project=dataset_project, dataset_tags=dataset_tags
1776 )
1777 return finish_dataset_get(instance, instance._id)
-> 1778 instance = get_instance(dataset_id)
1779 # Now we have the requested dataset, but if we want a mutable copy instead, we create a new dataset with the
1780 # current one as its parent. So one can add files to it and finalize as a new version.
1781 if writable_copy:
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:1694, in Dataset.get.<locals>.get_instance(dataset_id_)
1691 else:
1692 # we could not find the serialized state, start empty
1693 local_state_file = {}
-> 1694 instance_ = cls._deserialize(local_state_file, task)
1695 # remove the artifact, just in case
1696 if force_download and local_state_file:
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/site-packages/clearml/datasets/dataset.py:2619, in Dataset._deserialize(cls, stored_state, task)
2617 stored_state_file = Path(stored_state).as_posix()
2618 with open(stored_state_file, 'rt') as f:
-> 2619 stored_state = json.load(f)
2621 instance = cls(_private=cls.__private_magic, task=task)
2622 # assert instance._id == stored_state['id'] # They should match
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/json/__init__.py:293, in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
274 def load(fp, *, cls=None, object_hook=None, parse_float=None,
275 parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
276 """Deserialize ``fp`` (a ``.read()``-supporting file-like object containing
277 a JSON document) to a Python object.
278
(...)
291 kwarg; otherwise ``JSONDecoder`` is used.
292 """
--> 293 return loads(fp.read(),
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/json/__init__.py:357, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 del kw['encoding']
354 if (cls is None and object_hook is None and
355 parse_int is None and parse_float is None and
356 parse_constant is None and object_pairs_hook is None and not kw):
--> 357 return _default_decoder.decode(s)
358 if cls is None:
359 cls = JSONDecoder
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
332 def decode(self, s, _w=WHITESPACE.match):
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
File ~/apps/anaconda3/envs/yolov5-newest/lib/python3.8/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
It really looks to me like the Json file was not written correctly (it seems to be empty).
Expected behaviour
I'd expect the state to be found and that the commands don't give me this error. At the very least it would be nice to be able to remove these datasets and try to start over.
Environment
- app.clear.ml
- 1.13.1
- Python Version 3.8.13
- Debian GNU/Linux 11 (bullseye)
Hi @alex-sage ! Before deleting a dataset, you need to delete/archive all dataset versions under it. Note that, in the internal implementation, datasets are projects and the versions are tasks. So when trying to force delete a dataset through the API you should delete the project:
from clearml.backend_api.session.client import APIClient
client = APIClient()
client.projects.delete(project="1cdc8407d0494adf822d282f7ad45739", force=True)
I am not sure why the dataset you created didn't upload/write the state file properly. Could be a network/server error. If you have a consistent way to reproduce the issue, please let us know!
Hi @eugen-ajechiloae-clearml!
Thank you for your help, now I was finally able to delete the invalid datasets.
I found one way to reproduce the problem, but it seems to only happen when using our network storage as a target.
$ clearml-data create --project example-project --name example-dataset2 --storage /home/data/datasets/clearml/
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
ClearML results page: https://app.clear.ml/projects/e25905f30e964e53aefb5e2da15bcf8d/experiments/ba7ce395a6714802b52ca9ba2cd36e0a/output/log
ClearML dataset page: https://app.clear.ml/datasets/simple/e25905f30e964e53aefb5e2da15bcf8d/experiments/ba7ce395a6714802b52ca9ba2cd36e0a
New dataset created id=ba7ce395a6714802b52ca9ba2cd36e0a
$ clearml-data add --wildcard */*
clearml-data - Dataset Management & Versioning CLI
Adding files/folder/links to dataset id ba7ce395a6714802b52ca9ba2cd36e0a
0 files added
(yolov5-newest)
sage@w15:/home/data/datasets/ball_tracking/baseball/evaluation
$ clearml-data add --files .
clearml-data - Dataset Management & Versioning CLI
Adding files/folder/links to dataset id ba7ce395a6714802b52ca9ba2cd36e0a
Error: Could not load Dataset id=ba7ce395a6714802b52ca9ba2cd36e0a state
The second line is using an invalid wildcard on purpose, so that 0 files will be added. This seems to cause the state json file not to be written. Seems like this could be a bug, since I doubt our network keeps failing exactly at this moment 3 times in a row :wink: I checked in our local storage location (which is perfectly accessible) and the state file for that dataset is indeed not there.
Edit: I just noticed that the same thing also happens if I add files and abort the hash calculation (By pressing CTRL-C once) half-way through. It prints the message "User aborted", but does not seem to write back the state file.
This is still a bug that is affecting datasets stored locally.
If anything goes wrong during the dataset creation process, or it is interrupted manually (perhaps the user typed a wrong command and presses CTRL-C), the state.json
file is missing afterwards, and the dataset cannot be opened or even deleted via the clearml-data API.
Hi @alex-sage ! We have acknowledged the issue.
In the meantime: you should be able to use Task.delete
https://clear.ml/docs/latest/docs/references/sdk/task#delete using the dataset's ID to delete datasets, as datasets are tasks themselves