YFCC100M Caption Preprocessing

Open normster opened this issue 4 years ago • 2 comments

Hi,

I'm trying to reproduce the YFCC100M results and would like to know how image captions were preprocessed during training. For instance, how was the caption for the following sample extracted?

Title: denise%27s+peanut+chicken Description: recipe+here%3A+%3Ca+href%3D%22http%3A%2F%2Fallrecipes.com%2FRecipe%2FDenises-Peanut-Chicken%2FDetail.aspx%22%3Eallrecipes.com%2FRecipe%2FDenises-Peanut-Chicken%2FDetail.aspx%3C%2Fa%3E%0Ai+added+1%2F2+teaspoon+of+sriracha+hot+chili+sauce+and+used+1+TBS+of+chunky+peanut+butter+in+place+of+the+2+cups+of+peanuts.

Best, Norman

May 06 '21 03:05 normster

You can decode as follows.

import urllib.parse
title = "denise%27s+peanut+chicken"
print(urllib.parse.unquote_plus(title))
# prints the following
"denise's peanut chicken"

Oct 06 '21 18:10 naveenkumarmarri

@jongwook In addition to the exact parsing method, could you tell us how a title is merged with the corresponding description text?

Oct 25 '21 06:10 TonyLianLong