AMR-gs icon indicating copy to clipboard operation
AMR-gs copied to clipboard

Post-processing fails if sentence has <number><space><number>

Open iamanigeeit opened this issue 3 years ago • 1 comments

Problem The parser will output a non-breaking space character if the the input sentence contains \d+ \d+. This leads post-processing failure with error penman.DecodeError: Expected ":" or "/" at position XXX

Example .pred file

# ::id 9900
# ::snt @united iCloud it's not there yet -- PLEASE HELP 917 703 1472
# ::tokens ["@united", "iCloud", "it", "'s", "not", "there", "yet", "--", "PLEASE", "HELP", "917\u00a0703\u00a01472"]
# ::lemmas ["@united", "icloud", "it", "be", "not", "there", "yet", "--", "please", "help", "917\u00a0703\u00a01472"]
# ::pos_tags ["VBN", "NN", "PRP", "VBZ", "RB", "RB", "RB", ":", "VB", "NN", "CD"]
# ::ner_tags ["O", "O", "O", "O", "O", "O", "O", "O", "O", "O", "NUMBER"]
# ::abstract_map {}
(c0 / multi-sentence
    :snt1 (c1 / icloud
              :mod (c3 / be-located-at
                       :ARG1 (c7 / it)
                       :ARG2 (c8 / there)
                       :time (c9 / yet)))
    :snt2 (c2 / help-01
              :ARG1 (c4 / you)
              :mode imperative
              :ARG1 (c6 / book
                        :name (c10 / 917 703 1472))))

Note that in the last line, 917 703 1472 contains non-breaking spaces.

./postprocess_2.0.sh sample.txt.pred
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/postprocess.py", line 16, in postprocess2
    for amr in nr.restore_file(file_path):
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/postprocess/node_restore.py", line 19, in restore_file
    for amr in AMRIO.read(file_path):
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/io.py", line 48, in read
    amr.graph = AMRGraph.decode(' '.join(graph_lines))
  File "/home/perry/PycharmProjects/phd/AMR-gs-master/stog/data/dataset_readers/amr_parsing/amr.py", line 640, in decode
    _graph = amr_codec.decode(raw_graph_string)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 172, in decode
    span, data = self._decode_penman_node(s)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 405, in _decode_penman_node
    span, data = self._decode_penman_node(s, pos=pos)
  File "/home/perry/anaconda3/envs/stog/lib/python3.6/site-packages/penman.py", line 427, in _decode_penman_node
    raise DecodeError('Expected ":" or "/"', string=s, pos=pos)
penman.DecodeError: Expected ":" or "/" at position 364

Workaround Check for non-breaking spaces and replace them with - or _ in the output.

iamanigeeit avatar Feb 22 '21 04:02 iamanigeeit

While this repo uses an old version of penman, this issue also affects the latest version. I've created goodmami/penman#99 to track the issue there.

goodmami avatar Mar 19 '21 03:03 goodmami