Oscar icon indicating copy to clipboard operation
Oscar copied to clipboard

Is it possible if we train the model with another language?

Open dangcaptkd2 opened this issue 3 years ago • 2 comments

Hello, I'm trying to transfer your model on MSCOCO which was translated into Vietnamese. I got a prediction result that is not relative to the input picture although the training process achieve 0.35 on the Bleu_4 score. I used the Vietnamese tokenizer of PhoBert instead and I also changed the version of pytorch_transformers to 1.0.0 because of PhoBert requirement.
Please help me solve this issue, thanks.

dangcaptkd2 avatar Sep 23 '21 10:09 dangcaptkd2

Hi dangcaptkd2,

I am trying to do Image Captioning on the Arabic language and have the same problem! Can you please share how you used PhoBert for training Oscar?

jontooy avatar Oct 19 '21 12:10 jontooy

Hi jontooy, We simply changed the configuration --model_name_or_path to bert-base-multilingual-uncased and added --tokenizer_name vinai/phobert-base. You need to change the special tokens to adapt your tokenizer (ex: pad_token in Bert-base-multilingual tokenizer is 0 while pad_token in phobert is 1). That's all we changed and it worked!!! Hope this can help you, good luck

dangcaptkd2 avatar Oct 20 '21 13:10 dangcaptkd2