DSI-QG Train DSI-QG model

Train DSI-QG model

Open hibiki12y opened this issue 2 years ago • 1 comments

Step 1 of readme, description of script saying query generation.

But in run.py, "DocTqueryTrainer" use "IndexingTrainDataset" and it make document/query to docid dataset. So, result of training model make just docid.

It is correct? In step 2, use with "castorini/doc2query-t5-large-msmarco" model generate find question but, in my own model(trained with step 1 script) just generate docid.

Jan 17 '23 06:01 hibiki12y

Hi, thanks for the question.

For docTquery training task (step1), I'm basically reusing the IndexingTrainDataset class for doing this. If you check xorqa_docTquery_train_data.json which has a similar format as xorqa_DSI_train_data.json. I just treated the questions are 'docids' so that the trained model will generate questions for the given document (for DSI training task, this is generate docids for the given document). Is that make sense to you?

Sorry for the unclear class naming here..

Jan 17 '23 06:01 ArvinZhuang

DSI-QG DSI-QG copied to clipboard

Train DSI-QG model

DSI-QG
DSI-QG copied to clipboard