bi-att-flow
bi-att-flow copied to clipboard
f1 and em score is less than official result by 0.3%
I followed the exact instructions in the 'readme.md' file and started training my model with the following command:
python -m basic.cli --mode train --noload --len_opt --cluster --batch_size 50
After 18K steps I used the following command to test the model
python squad/evaluate-v1.1.py $HOME/data/squad/dev-v1.1.json out/basic/00/answer/test-####.json
and then I got f1=74.982, exact_match=64.90.
The scores for a single model in the original paper are em=68.0 and f1=77.3. And mine are 0.3 % point lower than those.
Because the codes are provided by the official group , the hyper parameters are exactly the same except the batch_size which won't affect the models' performance critically. The only reason I can think of is the different initial value.
Has anyone done the same work as I do ? Or can anyone provides other ideas?
Thanks a lot!!!!
same problem
@dengyuning, Did you run the code with the DEV branch with TF 1.1? If so did you follow the exact same commands given in the repo with no modifications to the code or commands?
Because I tried to train the model as per the instructions, but the EM I'm getting is 0.02###. Wonder what's happening
@thisum Did it work eventually?
@dengyuning, Do you mean your result is 0.3 lower than the result in the paper?
KeyError:'p'
when i run the code in path : bi-att-flow-0.3.0/basic/evaluator.py meet the error as follow 305 # id2answer_dict = {id_: get2(context, xi, span) for id, xi, span, context in zip(data_set.data['ids'], data_set.data['x'], spans, data_set.data['p'])} anyone could tell me what should i do ?