CLIP
CLIP copied to clipboard
What exactly are the sources of WebImageText dataset?
In this paper there is only a vague description about the WIT dataset:
... we constructed a new dataset of 400 million (image, text) pairs collected form a variety of publicly available sources on the Internet.
Can you enumerate specific sources and methods used to acquire the images?
How are texts engineered? Paper talks about prompt engineering for tasks datasets but not for the 400M training dataset!!
Looking for more information here