question_generation icon indicating copy to clipboard operation
question_generation copied to clipboard

Support for russian language

Open egorvkid opened this issue 3 years ago • 4 comments

Hello! How can i generate questions for russian language?

egorvkid avatar Feb 04 '21 05:02 egorvkid

By using MT5Tokenizer and MT5Model, you can generate question for other language.

mdhasanai avatar Mar 06 '21 08:03 mdhasanai

@mdhasanai

can you give an example code?

DimIsaev avatar Mar 18 '21 15:03 DimIsaev

@mdhasanai

can you give an example code?

Use T5Tokenizer when you preprocess the data in prepare_data.py For example, use this

from transformers import MT5Tokenizer, BartTokenizer, HfArgumentParser

instead of

from transformers import T5Tokenizer, BartTokenizer, HfArgumentParser

Replace all the T5Tokenizer/T5Model with MT5Tokenizer/MT5Model

In this way, you can train and evaluate for Non-English dataset. To know more about the MT5 model, follow this link. https://huggingface.co/transformers/model_doc/mt5.html

mdhasanai avatar Mar 18 '21 15:03 mdhasanai

@mdhasanai

thanks. model and Tokinezir all good

but how prepare datasets ? what data structure should be in the directory where "dev" and "train" are located?

DimIsaev avatar Mar 18 '21 19:03 DimIsaev