generate_boolean_questions_using_T5_transformer
generate_boolean_questions_using_T5_transformer copied to clipboard
Generating boolean (yes/no) questions from any content using T5 text-to-text transformer model and BoolQ dataset
Generating boolean (yes/no) questions from any content using T5 text-to-text transformer model and BoolQ dataset
Using this program you can generate boolean (yes/no) questions from any content.
A detailed Medium blogpost explaining necessary steps can be found here.
Input
The input to our program will be any content/paragraph -
Months earlier, Coca-Cola had begun “Project Kansas.” It sounds like a nuclear experiment but it was just a testing project for the new flavor. In individual surveys, they’d found that more than 75% of respondents loved the taste, 15% were indifferent, and 10% had a strong aversion to the taste to the point that they were angry.
Ouput
The output will be boolean (yes/no) questions generated from the above input.
Boolean (yes/no) questions generated from the T5 Model :
1: Does coca cola have a kansas flavor?
2: Is project kansas a new coca cola flavor?
3: Is project kansas the same as coca cola?
Inference code
The t5_inference.py file has all the code to run the model on any given paragraph.
Training the model
The training and validation datasets are present in the boolq_data folder.
Install the necessary libraries from requirements.txt.
Use any GPU machine and run train.py
Training this model for 4 epochs (default) took about 5-6 hrs on p2.xlarge (AWS ec2).
Note that since the dataset is small I barely used the validation set.
Also not all the questions generated by model are of high quality because of small training dataset it is trained on.