deeplake
deeplake copied to clipboard
[FEATURE] Auto QAing uploaded datasets
🚨🚨 Feature Request
If your feature will improve HUB
This feature is intended to verify datasets converted to Hub. It looks at all the data and checks if they have proper shapes and labels
Description of the possible solution
I propose function as ds.verify_upload() which intends to:
- check the shapes of images and their respective labels
- visualize randomly selected image and their labels
It would be helpful if you can add additional feedback to this issue.
@davidbuniat any suggestions? @Sai thanks a lot for bringing this up!
@zshashz also adding you in case you have any ideas.
@SaiNikhileshReddy the issue is good, but we need to understand what do we mean by verifying uploads. It can mean certain things.
- [ ] (ultimate) Does the raw data corresponds to the data stored inside hub?
Other questions
- [ ] Does the structure of the dataset (tensors, shapes and types) semantically make sense?
- [ ] Visually locally the data makes sense see this https://github.com/activeloopai/Hub/issues/1387
- [ ] Are labels correspond to the data? (more advanced checks)
- [ ] Is there a data shift?
I know some of the questions above are non-trivial to answer nor might have good solutions, but answering those would help to determine the scope of this issue.