standoff2conll icon indicating copy to clipboard operation
standoff2conll copied to clipboard

AssertionError: text mismatch and common.FormatError: b'Error verifying textbound T1 text mismatch (check encoding?)

Open agr505 opened this issue 2 years ago • 0 comments

Hi you can see the different stacktrace when attempting to convert brat to conll format. Is there any way to resolve the following errors:

Traceback (most recent call last): File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 134, in sys.exit(main(sys.argv)) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 124, in main convert_directory(path, args) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 102, in convert_directory convert_files(files, options) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 106, in convert_files document = read_ann(fn, options) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 64, in read_ann return Document.from_standoff( File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\document.py", line 432, in from_standoff verify_textbounds(textbounds, text) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff.py", line 204, in verify_textbounds raise FormatError(s.encode('utf-8')) common.FormatError: b'Error verifying textbound T1\tperson 128 135\tPatient\r: text mismatch (check encoding?): 128-135\n "lergies"\nvs. "Patient\r"'


Traceback (most recent call last): File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff.py", line 201, in verify_textbounds assert t.is_valid(text) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff.py", line 44, in is_valid assert text[self.start:self.end] == self.text,
AssertionError: text mismatch (check encoding?): 178-198 " DIAGNOSIS : C. dif" "s. "C. difficile colitis

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 134, in sys.exit(main(sys.argv)) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 124, in main convert_directory(path, args) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 102, in convert_directory convert_files(files, options) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 106, in convert_files document = read_ann(fn, options) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff2conll.py", line 64, in read_ann return Document.from_standoff( File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\document.py", line 432, in from_standoff verify_textbounds(textbounds, text) File "C:\Users\Aaron\Documents\Alpine Health\Datasets\bratconverter\standoff2conll-master\standoff.py", line 204, in verify_textbounds raise FormatError(s.encode('utf-8')) common.FormatError: b'Error verifying textbound T1\tproblem 178 198\tC. difficile colitis\r: text mismatch (check encoding?): 178-198\n " DIAGNOSIS :\r\nC. dif"\nvs. "C. difficile colitis\r"'

agr505 avatar Oct 18 '22 22:10 agr505