where is the Chinese version of RefCOCO(-/+/g) caption data
where is the Chinese version of RefCOCO(-/+/g) caption data
We still did not release those data, but if I solve some issues about copyrights I'll make them public.
I want to finetune the Chinese image caption model, so I need the Chinese version of the image caption data and the train scripts, just like the caption_data.zip file, train_caption_stage1.sh, and train_caption_stage2.sh
I revised the train_caption_stage1.sh script, like this
#bpe_dir=../../utils/BPE bpe_dir=../../utils/BERT_CN_dict bpe=bert
but I met the error ../../utils/BERT_CN_dict/encoder.json not found
Missing file "encoder.json" in BERT_CN_dict ?
need to set the parameter of the train.py script --bpe=bert or --bpe=${bpe}
Yes you need to set --bpe=bert, cuz we use bert tokenizer for Chinese, and there is no encoder.json.