paperdoll icon indicating copy to clipboard operation
paperdoll copied to clipboard

paperdoll dataset download issues

Open snownus opened this issue 8 years ago • 5 comments

Recently, I try to download the original images of paperdoll dataset. However, many are fails. Could you share with me the original dataset images?

snownus avatar Nov 21 '16 09:11 snownus

@snownus We cannot distribute copyrighted materials, as indicated in README.md. I would suggest you build a web scraper to obtain newer images from the same website.

kyamagu avatar Nov 21 '16 11:11 kyamagu

@kyamagu Thanks for your reply.

snownus avatar Nov 22 '16 02:11 snownus

@kyamagu Could you please advise how can I download " paperdoll_dataset.mat " ?

salehmiri avatar May 27 '17 00:05 salehmiri

Hello @kyamagu,

the last hours I spent time with ModaNet and the linked data of paperdoll (chictopia). It seems that a relevant part (~10%) of the dataset isn't available, because the used links are outdated or broken. I understand the trouble with the licenses and why it can't be a dataset based on the binary data. Nevertheless it could make sense to update and version the dataset periodically (for instance by date) to make your research more open and comprehensible. I will follow your suggestion from the top and develop a scraper for the broken links. Anyhow if chictopia changed the images, the original dataset won't be recoverable. What do you think?

Best, Darius

nok avatar Nov 26 '18 02:11 nok

@nok I do not have a good solution to this issue, due to legal concerns and maintenance costs. Versioning is not feasible, because that requires periodical data scraping to maintain.

The chictopia data is really a snapshot of what I scraped back in 2012. Redistributing scraped data is always of concern. On the other hand, I personally wish the dataset to be more open, as you suggest. One good news is that at least in my country, dataset distribution of copyrighted material for machine learning purpose becomes legal as of Jan 2019. This might mitigate some of my legal concerns at that point:

JAPAN AMENDS ITS COPYRIGHT LEGISLATION TO MEET FUTURE DEMANDS IN AI AND BIG DATA

Another problem is hosting ~40GB of images. I might be able to use one of cloud hosting for public data, but I haven't checked the detail yet. https://aws.amazon.com/jp/opendata/public-datasets/

kyamagu avatar Nov 26 '18 03:11 kyamagu