ace2005-preprocessing icon indicating copy to clipboard operation
ace2005-preprocessing copied to clipboard

Some problems when preprocessing

Open alderpaw opened this issue 6 years ago • 2 comments
trafficstars

Hello! Thanks for your contribution. When I run this code to preprocess the ACE 2005 corpus, some warnings and errors occurred, and I wonder if these warnings and errors would affect the result?

  • [Warning] The entity in the other sentence is mentioned. This argument will be ignored. This warning occurred multiple times during preprocessing.

  • [Warning] fail to find offset! (start_index: 3348, text: Doctors Without Borders/Médecins Sans Frontières (MSF, path: D:\Data\ace_2005_td_v7\data\English\un/timex2norm/alt.vacation.las-vegas_20050109.0133) Actually this warning raises an assertion error(end_idx != -1), but I comment out the corresponding code in main.py to avoid the error. I have read other issues and I know simply deleting the file may solve the problem, but I want to know if there are other solutions except for deleting. And I also wonder if the result includes some mistakes due to this warning? Look forward to your reply!

alderpaw avatar Nov 17 '19 08:11 alderpaw

I got this too. Could you please look at this issue. Thx! Exactly the same problem, it is about the text 'Doctors Without Borders/Médecins Sans blablabla'. Is there any possibility that this sentence has an issue?

yaof20 avatar Dec 10 '19 05:12 yaof20

I got the same problem.

[Warning] The entity in the other sentence is mentioned. This argument will be ignored. This warning occurred multiple times during preprocessing.

Can I just ignore these warnings? I look forward to hearing from you about these warnings.

SuooL avatar Mar 24 '20 09:03 SuooL