CLIP icon indicating copy to clipboard operation
CLIP copied to clipboard

YFCC100M Caption Preprocessing

Open normster opened this issue 4 years ago • 2 comments

Hi,

I'm trying to reproduce the YFCC100M results and would like to know how image captions were preprocessed during training. For instance, how was the caption for the following sample extracted?

Title: denise%27s+peanut+chicken Description: recipe+here%3A+%3Ca+href%3D%22http%3A%2F%2Fallrecipes.com%2FRecipe%2FDenises-Peanut-Chicken%2FDetail.aspx%22%3Eallrecipes.com%2FRecipe%2FDenises-Peanut-Chicken%2FDetail.aspx%3C%2Fa%3E%0Ai+added+1%2F2+teaspoon+of+sriracha+hot+chili+sauce+and+used+1+TBS+of+chunky+peanut+butter+in+place+of+the+2+cups+of+peanuts.

Best, Norman

normster avatar May 06 '21 03:05 normster

You can decode as follows.

import urllib.parse
title = "denise%27s+peanut+chicken"
print(urllib.parse.unquote_plus(title))
# prints the following
"denise's peanut chicken"

naveenkumarmarri avatar Oct 06 '21 18:10 naveenkumarmarri

@jongwook In addition to the exact parsing method, could you tell us how a title is merged with the corresponding description text?

TonyLianLong avatar Oct 25 '21 06:10 TonyLianLong