Tong Liu
Tong Liu
> actually I dont think you can reproduce the number in the paper by using the author's codes....If somebody only gives part of his codes with bugs, how could you...
Hi did you solve this problem? Same problem here.
> Hello @EQ3A2A @TongLiu-github can you please share an example that reproduces the error with a public dataset I can test? Thanks for the reply. I solved this problem from:...
Another very typical issue is for some data, there could be multiple GT answers, e.g., live_multiple_8-4-0, the "indian steam" should also be counted as correct. "error": [ "Invalid value for...
Feels like there are many data like this, making the final evaluation biased.
another case that should be considered as correct: live_multiple_55-22-2: ``` "model_result_decoded": [ { "get_product_details": { "product_id": "iPhone 12", "color": "white" } } ], "possible_answer": [ { "inventory_management": { "product_id": [...
> Wondering if this was addressed? @thepowerfuldeez Wondering how was this addressed?