mcQA : Multiple Choice Questions Answering

Answering multiple choice questions with Language Models.

News 📢

🚧 This project is currently under development. Stay tuned ! 🤩

Jun 6th, 2020

Refactored data subpackage, the library now supports RACE, Synonym, Swag and ARC data sets.
Upgrade to transformers==2.10.0.

Installation

With pip

pip install mcqa

From source

git clone https://github.com/mcqa-suite/mcqa.git
cd mcQA
pip install -e .

Getting started

Data preparation

To train a mcQA model, you need to create a csv file with n+2 columns, n being the number of choices for each question. The first column should be the context sentence, the n following columns should be the choices for that question and the last column is the selected answer.

Below is an example of a 3 choice question (taken from the CoS-E dataset) :

Context sentence	Choice 1	Choice 2	Choice 3	Label
People do what during their time off from work?	take trips	brow shorter	become hysterical	take trips

If you have a trained mcQA model and want to infer on a dataset, it should have the same format as the train data, but the label column.

See example data preparation below:

from mcqa.data import MCQAData

mcqa_data = MCQAData(bert_model="bert-base-uncased", lower_case=True, max_seq_length=256) 
                     
train_dataset = mcqa_data.read(data_file='swagaf/data/train.csv', is_training=True)
test_dataset = mcqa_data.read(data_file='swagaf/data/test.csv', is_training=False)

Model training

from mcqa.models import Model

mdl = Model(bert_model="bert-base-uncased", device="cuda") 
            
mdl.fit(train_dataset, train_batch_size=32, num_train_epochs=20)

Prediction

preds = mdl.predict(test_dataset, eval_batch_size=32)

Evaluation

from sklearn.metrics import accuracy_score
from mcqa.data import get_labels

print(accuracy_score(preds, get_labels(train_dataset)))

References

Type	Title	Author	Year
:newspaper: Paper	Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets	Mor Geva, Yoav Goldberg, Jonathan Berant	2019
:newspaper: Paper	Explain Yourself! Leveraging Language Models for Commonsense Reasoning	Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong and Richard Socher	2019
:newspaper: Paper	SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference	Rowan Zellers, Yonatan Bisk, Roy Schwartz and Yejin Choi	2018
:newspaper: Paper	Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering	Todor Mihaylov, Peter Clark, Tushar Khot, Ashish Sabharwal	2018
:newspaper: Paper	CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge	Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant	2018
:newspaper: Paper	RACE: Large-scale ReAding Comprehension Dataset From Examinations	Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang and Eduard Hovy	2017
:computer: Framework	Scikit-learn: Machine Learning in Python	Pedregosa et al.	2011
:computer: Framework	PyTorch	Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan	2016
:computer: Framework	Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.	Hugging Face	2018
:video_camera: Video	Stanford CS224N: NLP with Deep Learning Lecture 10 – Question Answering	Christopher Manning	2019

LICENSE

Apache-2.0

Contributing

Read our Contributing Guidelines.

Citation

@misc{Taycir2019,
  author = {mcQA-suite},
  title = {mcQA},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/mcQA-suite/mcQA/}}
}

mcQA
mcQA copied to clipboard

Metadata

mcQA : Multiple Choice Questions Answering

News 📢

Jun 6th, 2020

Installation

With pip

From source

Getting started

Data preparation

Model training

Prediction

Evaluation

References

LICENSE

Contributing

Citation

← Metadata

Owner

Metadata

mcQA mcQA copied to clipboard

Metadata

mcQA : Multiple Choice Questions Answering

News 📢

Jun 6th, 2020

Installation

With pip

From source

Getting started

Data preparation

Model training

Prediction

Evaluation

References

LICENSE

Contributing

Citation

← Metadata

Owner

Metadata

mcQA
mcQA copied to clipboard