BlonDe icon indicating copy to clipboard operation
BlonDe copied to clipboard

Parsing error with BWBReader.py

Open mjpost opened this issue 1 year ago • 3 comments

Hi,

I am trying to read through the annotated BWB data. Here is what I did to setup:

  • I created a requirements.txt and installed pandas, click, and spacy
  • I downloaded the two spacy models, en_core_web_sm and zh_core_web_sm
  • I downloaded the BWB corpus from the provided Google Drive link and unpacked it
  • I fixed the hard-coded path to the annotation dataset to point to /path/to/BWB_dataset/test_with_annotations. This doesn't match the hard-coded path, but it is the closest thing that I could find that matches the dataset layout described in the code

I then run the code using this invocation, which produces a parsing error. I've tried to dig into this, but it seems like it is going to take some work. Do you see anything wrong with what I've done?

$ cd BlonDe
$ python BWB/BWBReader.py
Traceback (most recent call last):
  File "BWB/BWBReader.py", line 546, in <module>
    for sentences in bwb_reader.dataset_iterator_from_cache(cache_file, dir_path):
  File "BWB/BWBReader.py", line 170, in dataset_iterator_from_cache
    self.to_cache(dir_path, cache_file)
  File "BWB/BWBReader.py", line 534, in to_cache
    for sentences in self.dataset_iterator(dir_path):
  File "BWB/BWBReader.py", line 213, in dataset_iterator
    yield from self.sentence_iterator(chs_path, ref_path)
  File "BWB/BWBReader.py", line 271, in sentence_iterator
    for chs_document, ref_document in self.dataset_document_iterator(chs_path, ref_path):
  File "BWB/BWBReader.py", line 258, in dataset_document_iterator
    chs_document.append(self._line_to_BWBsentence(line, "zh", document_id, sentence_id))
  File "BWB/BWBReader.py", line 418, in _line_to_BWBsentence
    k = self._deal_with_ann_span(line, k, mention_stack, quote_stack,
  File "BWB/BWBReader.py", line 388, in _deal_with_ann_span
    k = self._deal_with_ann_span(line, k, mention_stack, quote_stack,
  File "BWB/BWBReader.py", line 367, in _deal_with_ann_span
    quote = quote_stack.pop()
IndexError: pop from empty list

mjpost avatar Aug 18 '23 21:08 mjpost