OFA icon indicating copy to clipboard operation
OFA copied to clipboard

where is the Chinese version of RefCOCO(-/+/g) caption data

Open funykatebird opened this issue 3 years ago • 3 comments

where is the Chinese version of RefCOCO(-/+/g) caption data

funykatebird avatar Dec 07 '22 02:12 funykatebird

We still did not release those data, but if I solve some issues about copyrights I'll make them public.

JustinLin610 avatar Dec 07 '22 18:12 JustinLin610

I want to finetune the Chinese image caption model, so I need the Chinese version of the image caption data and the train scripts, just like the caption_data.zip file, train_caption_stage1.sh, and train_caption_stage2.sh

I revised the train_caption_stage1.sh script, like this

#bpe_dir=../../utils/BPE bpe_dir=../../utils/BERT_CN_dict bpe=bert

but I met the error ../../utils/BERT_CN_dict/encoder.json not found

Missing file "encoder.json" in BERT_CN_dict ?

need to set the parameter of the train.py script --bpe=bert or --bpe=${bpe}

funykatebird avatar Dec 08 '22 09:12 funykatebird

Yes you need to set --bpe=bert, cuz we use bert tokenizer for Chinese, and there is no encoder.json.

JustinLin610 avatar Dec 20 '22 17:12 JustinLin610