datasets [data request] OpenImages v7

Name of dataset: OpenImages v7 URL of dataset: https://g.co/dataset/open-images License of dataset: licensed by Google Inc. under CC BY 4.0 license. The images are listed as having a CC BY 2.0 license.

Short description of dataset and use case(s): bigger than ImageNet with 61M image level labels, 16M bounding boxes, 3M visual relationships, 2.7M instance segmentation masks, 600k localized narratives (synchronized audio and text caption, with mouse trace), and 66M point labels.

Folks who would also like to see this dataset in tensorflow/datasets, please thumbs-up so the developers can know which requests to prioritize.

And if you'd like to contribute the dataset (thank you!), see our guide to adding a dataset.

Aug 14 '19 14:08 rodrigob

aman2930 is looking into this.

Aug 21 '19 19:08 pierrot0

Could you please assign it to me?

Aug 21 '19 19:08 aman2930

any update on this ?

Sep 17 '19 13:09 rodrigob

For info we are now at open_images_v6 (same image labels, boxes, masks, and images as v5, but new types of annotations added, and larger number of relation annotations).

Apr 17 '20 13:04 rodrigob

Nice, we would love have this!

For info, we (TFDS team) ensure the core API support and help with issues, but we let the community (both internal and external) implement the datasets they want (we have 130+ dataset requests).

Don't hesitate to help us with this. Or if anyone else is interested to work on this, don't hesitate to send a PR. By starting from open_images_v4, it should be relatively straightforward to add a OpenImagesV6: https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/object_detection/open_images.py We're here to help if anyone encounter issues for this.

Apr 17 '20 18:04 Conchylicultor

relatively straightforward

Not so much, since new data types / data conventions are needed. (instance segmentation, localized captions, audio)

@jponttuset FYI.

Apr 17 '20 18:04 rodrigob

@Conchylicultor I want to work on it , should we keep both v4 and v6 ?

Apr 17 '20 18:04 Eshan-Agarwal

Note also that there was a potential bug in v4 tfds import (in the quantization of the image level machine scores), so v5/v6 should be implemented with care (and probably consider removing the quantization). Please add me in the reviewers pool.

Apr 17 '20 18:04 rodrigob

@Eshan-Agarwal, yes we should keep both v4 and v6. However I feel this one may be a little too ambitious for you, especially if you don't have enough compute power.

Apr 17 '20 23:04 Conchylicultor

Yes as open_images dataset have huge size but I will try.

Apr 18 '20 02:04 Eshan-Agarwal

For info, I am currently working on this issue.

Jul 12 '21 15:07 rodrigob

Any updates on this? 🤗 This would be super useful to have

Jan 20 '22 17:01 BlackHC

Any updates on this? 🤗 This would be super useful to have

For context, a not-yet released implementation exists. It was used to generate the new Open Image visualizers. I will be spending the next couple of weeks cleaning the code and pushing the public release.

Oct 26 '22 14:10 rodrigob

any updates on this :? I guess it would optimize a lot the work for a beginner.

May 03 '23 11:05 joaoguilhermeS

Would invite so much more use and experimentation!

Sep 11 '23 22:09 whoschek