BlonDe
BlonDe copied to clipboard
Parsing error with BWBReader.py
Hi,
I am trying to read through the annotated BWB data. Here is what I did to setup:
- I created a
requirements.txt
and installed pandas, click, and spacy - I downloaded the two spacy models, en_core_web_sm and zh_core_web_sm
- I downloaded the BWB corpus from the provided Google Drive link and unpacked it
- I fixed the hard-coded path to the annotation dataset to point to
/path/to/BWB_dataset/test_with_annotations
. This doesn't match the hard-coded path, but it is the closest thing that I could find that matches the dataset layout described in the code
I then run the code using this invocation, which produces a parsing error. I've tried to dig into this, but it seems like it is going to take some work. Do you see anything wrong with what I've done?
$ cd BlonDe
$ python BWB/BWBReader.py
Traceback (most recent call last):
File "BWB/BWBReader.py", line 546, in <module>
for sentences in bwb_reader.dataset_iterator_from_cache(cache_file, dir_path):
File "BWB/BWBReader.py", line 170, in dataset_iterator_from_cache
self.to_cache(dir_path, cache_file)
File "BWB/BWBReader.py", line 534, in to_cache
for sentences in self.dataset_iterator(dir_path):
File "BWB/BWBReader.py", line 213, in dataset_iterator
yield from self.sentence_iterator(chs_path, ref_path)
File "BWB/BWBReader.py", line 271, in sentence_iterator
for chs_document, ref_document in self.dataset_document_iterator(chs_path, ref_path):
File "BWB/BWBReader.py", line 258, in dataset_document_iterator
chs_document.append(self._line_to_BWBsentence(line, "zh", document_id, sentence_id))
File "BWB/BWBReader.py", line 418, in _line_to_BWBsentence
k = self._deal_with_ann_span(line, k, mention_stack, quote_stack,
File "BWB/BWBReader.py", line 388, in _deal_with_ann_span
k = self._deal_with_ann_span(line, k, mention_stack, quote_stack,
File "BWB/BWBReader.py", line 367, in _deal_with_ann_span
quote = quote_stack.pop()
IndexError: pop from empty list