nl2bash
nl2bash copied to clipboard
Failure to run eval script
I'm trying to do a smoke test for your eval suite but cannot get the script to run properly.
I've followed the setup instructions:
- run
make
in the root directory - add
nl2bash
toPYTHONPATH
- run
make data
inscripts
From here I attempt to confirm the dev set evaluates well against itself: from scripts
I run
./bash-run.sh --data bash --prediction_file ../data/bash/dev.cm.filtered --eval
this produces the following stdout & traceback:
Reading data from /workspace/sempar/nl2bash/encoder_decoder/../data/bash
Saving models to /workspace/sempar/nl2bash/encoder_decoder/../model/seq2seq
Loading data from /workspace/sempar/nl2bash/encoder_decoder/../data/bash
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/train.nl.filtered
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/train.cm.filtered
9985 data points read.
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/dev.nl.filtered
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/dev.cm.filtered
782 data points read.
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/test.nl.filtered
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/test.cm.filtered
779 data points read.
(Auto) evaluating ../data/bash/dev.cm.filtered
782 predictions loaded from ../data/bash/dev.cm.filtered
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 378, in <module>
tf.compat.v1.app.run()
File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 36, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 301, in main
eval(dataset, FLAGS.prediction_file)
File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 176, in eval
return eval_tools.automatic_eval(prediction_path, dataset, top_k=3, FLAGS=FLAGS, verbose=verbose)
File "/workspace/sempar/nl2bash/eval/eval_tools.py", line 246, in automatic_eval
"{} vs. {}".format(len(grouped_dataset), len(prediction_list)))
ValueError: ground truth and predictions length must be equal: 701 vs. 782
You can see it's evaluating against a dataset with only 701 bash utterances even though it successfully read 782 from the dev set in data_utils.load_data
. Do you know why this is happening?
(If it helps I'm in Python 3.7.11 running on a fresh install of Ubuntu 18.04.6)
Update
I've overridden encoder_decoder.data_utils.group_parallel_data
to refrain from aggregating matching NL utterances in order to match the dataset sizes. This allows the eval logic to run, though as it stands, the code breaks the following assertion in eval.token_based.corpus_bleu_score
:
assert(loose_constraints or node.get_num_of_children() == 1)
The root nodes frequently have more than 1 child in the dev set. I changed the invocation of data_tools.bash_tokenizer
to set the loose_contraints
flag to True
. This dodges the assertion error.
I'm not sure if I am taking on undesirable assumptions by making this change. Is there a reason the ground truth ASTs break the tokenizer when loose_constraints
is left False
?