Quac Dataset
When will you support evaluation on the quac dataset? I found the results of the Llama2 paper difficult to reproduce. Especially regarding how to segment the answer for the base model's F1 score.
are there any solutions? i am confused
If nobody is already working on it, I can try this feature.
If nobody is already working on it, I can try this feature.
That would be great!
Hi @glerzing are you working on quac? If not, i can take that.
Yes, I'm on it. It's a bit more complicated than I expected, it will probably take weeks.
Actually, I will not have the opportunity to finish it, sorry. I had other important things to do. The implementation should be quite similar to the one of CoQA, but to avoid having the same problem as with #1231 , you need a way to make a list of predictions for each document, probably by implementing construct_requests. @Sanchit-404, if you are still motivated, feel free to pick this issue.
Hi, any updates on this ?
@glerzing if you have a partial implementation or high level plan, please share it, will be helpful for anyone to pick it up
The Python script is too much of a draft to share. It's not worth much, but here is the README.md. I would have liked to have a yaml file like this like this, with the ability to redefine construct_requests inside the yaml file, like with doc_to_text or process_results. That would be cleaner than having to implement it with a class that inherits from Task like in squadv2.