keras-nlp
keras-nlp copied to clipboard
Add a SQuAD example
This is a two part issue, which will be a large time investment.
First, we would like to build a squad evaluation example, in /examples/squad_benchmark
, based on our exampling in /examples/glue_benchmark. Second, we should publish an example on keras.io showing how to do SQuAD evaluation on a backbone.
We can start with writing the example in this repo.
Steps:
- [ ] Add a
squad.py
file.- [ ] Loads the squad dataset via
tfds
. - [ ] Run squad evaluation on a BERT backbone.
- [ ] Loads the squad dataset via
- [ ] Add a
README.md
with a description on how to run.
@TheAthleticCoder, would you like to take up this issue?
Yes! I would like to take up the issue
@TheAthleticCoder thanks! Let us know if you have questions, this is a significant piece of work.
One resource is the original BERT squad script -> https://github.com/google-research/bert/blob/master/run_squad.py
There is a lot of input messing around we will need to do, as shown in that script.
Hey, so I was using these references and noticed that since it is span-based labelling, I would need to handle offsets as seen here:
https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_qa.py#L386 and https://github.com/google-research/bert/blob/master/run_squad.py#L242
Should I do it using TensorFlow ops or can I use standard python objects along with tf.py_function
I think for now we can forgo worrying about TensorFlow ops. Let's focus on a solution that is concise and readable.
Probably for now we can either:
- Compute the preprocessed dataset in pure python, then convert to a
tf.data.Dataset
before training. - Compute the preprocessed dataset with
tf.data
andtf.py_function
. Then usedataset.cache()
the dataset before callingfit()
.
Either seems fine! I would go with whatever is most clear and readable for now.
Eventually, we should have a solution for offsets that is tf op friendly and baked into our library, but I think it makes sense to do that as a follow up. We can use this example to inform our API design down the road.
Hey! I would like to take this issue up
Hey! I have handled the dataset part. Please check it out here: SQuAD
If there are any changes to be made, do let me know. cc: @abheesht17 @mattdangerw
@TheAthleticCoder are you still planning to work on it?
@shivance Hey, no I don't think I won't be able to find time to do it :( You can take it up 👍🏻
Seems like an example already exists here - Keras Examples - Text Extraction with BERT
@abheesht17 can you assign this issue to me? if there is no one assigned.
Sure, @abuelnasr0. Assigned it to you, have fun!
@pri1311 - thanks for the pointer, will be extremely helpful for @abuelnasr0 when he tries using KerasNLP blocks for writing the example!