bert-for-tf2 icon indicating copy to clipboard operation
bert-for-tf2 copied to clipboard

Fine-tuning bert-for-tf2 on a Q&A task

Open vinnytwice opened this issue 6 years ago • 3 comments

Hi and thanks for this great repo, I was trying to adapt bert for tf2.0 but I'm too a novice for this. Now my question is: how do I fine-tune this for a personal dataset? My goal is to make a Q&A system using bert , Thank you very much Vincenzo

vinnytwice avatar Oct 10 '19 07:10 vinnytwice

this could be quite of a task (although there is a lot of information on the net), depending on how you represent your data. What should be easily accessible is trying to reproduce the BERT results on SQuAD. Check the BERT paper for how exactly BERT was applied on the SQuAD task. And I'll check if I could add an example on SQuAD here.

kpe avatar Oct 10 '19 14:10 kpe

Hi Kpe thanks for answering.. Say I just present the dataset as sequence of paragraph separated by 3 "\n" as I saw in a repo extending BERT https://github.com/Nagakiran1/Extending-Google-BERT-as-Question-and-Answering-model-and-Chatbot

I than should be able to do it with bert-for-tf2 right?

vinnytwice avatar Oct 10 '19 15:10 vinnytwice

@vinnytwice - yes, sure. I guess, the 3 "\n" are how the run_squad.py script chooses to represent the data - text file with the context text and a question (separated by 3 \n), so that it is easy to parse. It would then use different segment_ids - 1 for the tokens belonging to the context text, and 0 for the question, when feeding the input into BERT.

kpe avatar Oct 11 '19 11:10 kpe