f1 and em score is less than official result by 0.3%

Open dengyuning opened this issue 7 years ago • 5 comments

I followed the exact instructions in the 'readme.md' file and started training my model with the following command: python -m basic.cli --mode train --noload --len_opt --cluster --batch_size 50 After 18K steps I used the following command to test the model python squad/evaluate-v1.1.py $HOME/data/squad/dev-v1.1.json out/basic/00/answer/test-####.json and then I got f1=74.982, exact_match=64.90. The scores for a single model in the original paper are em=68.0 and f1=77.3. And mine are 0.3 % point lower than those. Because the codes are provided by the official group , the hyper parameters are exactly the same except the batch_size which won't affect the models' performance critically. The only reason I can think of is the different initial value. Has anyone done the same work as I do ? Or can anyone provides other ideas? Thanks a lot!!!!

Sep 22 '17 07:09 dengyuning

same problem

Sep 28 '17 06:09 guotong1988

@dengyuning, Did you run the code with the DEV branch with TF 1.1? If so did you follow the exact same commands given in the repo with no modifications to the code or commands?

Because I tried to train the model as per the instructions, but the EM I'm getting is 0.02###. Wonder what's happening

Oct 24 '17 03:10 thisum

@thisum Did it work eventually?

Jan 05 '18 04:01 bglearning

@dengyuning, Do you mean your result is 0.3 lower than the result in the paper?

Jan 15 '18 07:01 Leeyouxie

KeyError:'p'

when i run the code in path : bi-att-flow-0.3.0/basic/evaluator.py meet the error as follow 305 # id2answer_dict = {id_: get2(context, xi, span) for id, xi, span, context in zip(data_set.data['ids'], data_set.data['x'], spans, data_set.data['p'])} anyone could tell me what should i do ?

Jun 25 '18 07:06 houzhenzhen

bi-att-flow bi-att-flow copied to clipboard

f1 and em score is less than official result by 0.3%

KeyError:'p'

bi-att-flow
bi-att-flow copied to clipboard