cdQA
cdQA copied to clipboard
[WIP] Add XLNet support for Reader
- [x] Implement sklearn wrapper on top of new QA script provided by HF
- [x] Train XLNet on SQuAD 2.0 with wrapper
- [x] Add ability to load pre-trained reader with
.bin
file instead of pickling class object, ensuring compatibility with HF and avoiding confusion - [x] Report training time and hardware used
- [x] Set
verbose
parameter - [x] Report evaluation metrics
- [ ] Integrate in
QAPipeline()
- [ ] Replace
log_prob
(softmax probs) by the raw logits to select best answer among paragrahs https://github.com/cdqa-suite/cdQA/blob/993ac5e4d8dc5cbb57db033d7bceae7ddbd77310/cdqa/reader/utils_squad.py#L873-L876 - [ ] Evaluate complete
cdQA
pipeline - [ ] Update
cdQA-annotator
andcdQA-ui
to supportno answer
Codecov Report
Merging #205 (660760c) into master (bda1c32) will decrease coverage by
8.00%
. The diff coverage is0.00%
.
@@ Coverage Diff @@
## master #205 +/- ##
==========================================
- Coverage 31.23% 23.22% -8.01%
==========================================
Files 7 9 +2
Lines 1508 2032 +524
==========================================
+ Hits 471 472 +1
- Misses 1037 1560 +523
Impacted Files | Coverage Δ | |
---|---|---|
cdqa/reader/reader_sklearn.py | 0.00% <0.00%> (ø) |
|
cdqa/reader/utils_squad.py | 0.00% <0.00%> (ø) |
|
cdqa/reader/utils_squad_evaluate.py | 0.00% <0.00%> (ø) |
|
cdqa/reader/bertqa_sklearn.py | 58.90% <0.00%> (+0.15%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update bda1c32...660760c. Read the comment docs.
ValueError
during evaluation after training:
Traceback (most recent call last):
File "tutorial-train-xlnet-squad.py", line 39, in <module>
out_eval, final_prediction = reader.evaluate(X='dev-v2.0.json')
ValueError: too many values to unpack (expected 2)
To use the XLNet reader with a pretrained .bin
model:
import wget
from cdqa.reader.reader_sklearn import Reader
wget.download(url='https://github.com/cdqa-suite/cdQA/releases/download/XLNet_cased_vCPU/pytorch_model.bin', out='.')
# cast Reader class with train params
reader = Reader(model_type='xlnet',
model_name_or_path='xlnet-base-cased',
output_dir='.',
evaluate_during_training=False,
no_cuda=False,
fp16=False,
pretrained_model_path='.')
# make some predictions
reader.predict(X='dev-v2.0-small.json')
- hardware: GeForce RTX 208
- training time: 9 hours
Implementation of XLNetForQuestionAnswering is pretty different from BertForQuestionAnswering and the official HF version does not output the logits for now. XLNetForQuestionAnswering uses Beam Search to find the best (and more probable) span, while BertForQuestionAnswering maximises the start_score and end_score separately.
from #196
Any progress with this? In meantime we have even better models: RoBERTa and ERNIE 2.0
Hi @alex-movila
You can follow our progress on this PR here. We described all the steps to achieve in order to be synced with the latest changes made by @huggingface.
At the moment we depend on the pytorch-transformers repository as a backend for our QA system. The @huggingface community is progressively implementing new models. They are now in the process of adding RoBERTa (see this). They don't have plan to add ERNIE a the moment (see this).
Their new API should allow the user to use any transformer to do QA. We are looking to provide the same thing with cdQA.
I could not replicate results of official SQuAD 2.0 with our trained XLNet model:
from cdqa.reader.reader_sklearn import Reader
reader = Reader(model_type='xlnet',
model_name_or_path='xlnet-base-cased',
fp16=False,
output_dir='.',
no_cuda=False,
pretrained_model_path='.')
reader.evaluate(X='dev-v2.0.json')
See my colab notebook for reproducibility: https://colab.research.google.com/github/cdqa-suite/cdQA/blob/sync-huggingface/examples/tutorial-eval-xlnet-squad2.0.ipynb
{
"exact": 35.643897919649625,
"f1": 40.81892328134685,
"total": 11873,
"HasAns_exact": 67.29082321187585,
"HasAns_f1": 77.65571459504568,
"HasAns_total": 5928,
"NoAns_exact": 4.087468460891506,
"NoAns_f1": 4.087468460891506,
"NoAns_total": 5945,
"best_exact": 50.07159100480081,
"best_exact_thresh": 0.0,
"best_f1": 50.07159100480081,
"best_f1_thresh": 0.0
}
{'HasAns_exact': 67.29082321187585,
'HasAns_f1': 77.65571459504568,
'HasAns_total': 5928,
'NoAns_exact': 4.087468460891506,
'NoAns_f1': 4.087468460891506,
'NoAns_total': 5945,
'best_exact': 50.07159100480081,
'best_exact_thresh': 0.0,
'best_f1': 50.07159100480081,
'best_f1_thresh': 0.0,
'exact': 35.643897919649625,
'f1': 40.81892328134685,
'total': 11873}
It might be an not optimzed hyperparameters issue (see this: https://github.com/huggingface/pytorch-transformers/issues/822).
@andrelmfarias can you confirm the params you used during training? (https://github.com/cdqa-suite/cdQA/blob/sync-huggingface/examples/tutorial-train-xlnet-squad.py)
I had to reduce some parameters (max_length
, batch_size
, etc.). The GPU did not handle the training with default parameters. It might be that.
I could not replicate results of official SQuAD 2.0 with our trained XLNet model:
from cdqa.reader.reader_sklearn import Reader reader = Reader(model_type='xlnet', model_name_or_path='xlnet-base-cased', fp16=False, output_dir='.', no_cuda=False, pretrained_model_path='.') reader.evaluate(X='dev-v2.0.json')
See my colab notebook for reproducibility: https://colab.research.google.com/github/cdqa-suite/cdQA/blob/sync-huggingface/examples/tutorial-eval-xlnet-squad2.0.ipynb
{ "exact": 35.643897919649625, "f1": 40.81892328134685, "total": 11873, "HasAns_exact": 67.29082321187585, "HasAns_f1": 77.65571459504568, "HasAns_total": 5928, "NoAns_exact": 4.087468460891506, "NoAns_f1": 4.087468460891506, "NoAns_total": 5945, "best_exact": 50.07159100480081, "best_exact_thresh": 0.0, "best_f1": 50.07159100480081, "best_f1_thresh": 0.0 } {'HasAns_exact': 67.29082321187585, 'HasAns_f1': 77.65571459504568, 'HasAns_total': 5928, 'NoAns_exact': 4.087468460891506, 'NoAns_f1': 4.087468460891506, 'NoAns_total': 5945, 'best_exact': 50.07159100480081, 'best_exact_thresh': 0.0, 'best_f1': 50.07159100480081, 'best_f1_thresh': 0.0, 'exact': 35.643897919649625, 'f1': 40.81892328134685, 'total': 11873}
It might be an not optimzed hyperparameters issue (see this: huggingface/transformers#822).
@andrelmfarias can you confirm the params you used during training? (https://github.com/cdqa-suite/cdQA/blob/sync-huggingface/examples/tutorial-train-xlnet-squad.py)
This issue is being discussed here: https://github.com/huggingface/transformers/issues/947#issuecomment-541223462