DSI-QG
DSI-QG copied to clipboard
Train DSI-QG model
Step 1 of readme, description of script saying query generation.
But in run.py, "DocTqueryTrainer" use "IndexingTrainDataset" and it make document/query to docid dataset. So, result of training model make just docid.
It is correct? In step 2, use with "castorini/doc2query-t5-large-msmarco" model generate find question but, in my own model(trained with step 1 script) just generate docid.
Hi, thanks for the question.
For docTquery training task (step1), I'm basically reusing the IndexingTrainDataset class for doing this. If you check xorqa_docTquery_train_data.json which has a similar format as xorqa_DSI_train_data.json. I just treated the questions are 'docids' so that the trained model will generate questions for the given document (for DSI training task, this is generate docids for the given document). Is that make sense to you?
Sorry for the unclear class naming here..