bert-event-extraction icon indicating copy to clipboard operation
bert-event-extraction copied to clipboard

mislabeled data

Open ThanhDucPham opened this issue 4 years ago • 7 comments

Hi, I'm try to reproduce your model. But my result is low. I have checked these labels that my model predicted and I found a lot of labels that was predicted to Event sub-type difference to tag "O" but was tagged to 'O' tag in the dataset. Therefore, my precision score is downgrade( I only get precison=62%) . Did you encountered with this issue. If so, how did your tackled with it. You fixed wrong label in test, dev sets or keep the original data to evaluate these score? Hope to see your answer soon! Thank you so much!

ThanhDucPham avatar Apr 24 '20 10:04 ThanhDucPham

Do you have a problem with keyerror when running the code? p = [postag2idx[postag] for postag in p] The 95th line of code in data_load.py

sunsunshinesunshine avatar Apr 24 '20 11:04 sunsunshinesunshine

Do you have a problem with keyerror when running the code? p = [postag2idx[postag] for postag in p] The 95th line of code in data_load.py

I have removed all variables for entities and postag because it is not used in this model

ThanhDucPham avatar Apr 24 '20 11:04 ThanhDucPham

Do you have a problem with keyerror when running the code? p = [postag2idx[postag] for postag in p] The 95th line of code in data_load.py

I have removed all variables for entities and postag because it is not used in this model

Thank you very much, it is really not used, I will try!!!

sunsunshinesunshine avatar Apr 24 '20 14:04 sunsunshinesunshine

@ThanhDucPham Nếu model không sử dụng entity (NER) và Pos thì chỉ dừng lại ở bài toán Trigger recognition thôi đúng không bạn? Mình có loại bỏ nhãn entity và thấy hầu như argument model nhận sai vì bản chất chưa được gán nhãn NER. Mình sử dụng BERT pretrain bert-base-multilingual-uncased và data tiếng việt

mactiendinh avatar Jun 09 '20 07:06 mactiendinh

@mactiendinh Theo như cách thiết kế ở module cho argument của repos này thì vốn đã phải biết đâu là entity rồi bạn( vì nhãn được predict chung cho cả cụm entity). Một vấn đề mình đang băn khoăn đó là những event được dự đoán sai, sau đó sẽ tiếp tục dự đoán ra argument sai, nếu ta tính cả những nhãn này vào việc đánh giá chất lương cho module của argument thì có đúng không? ( bởi vì việc dự đoán đã sai ngay từ khi event được dự đoán sai rồi) Bạn nghĩ sao về việc này

ThanhDucPham avatar Jun 09 '20 13:06 ThanhDucPham

I have removed the pos and entities ,an error occured like this:,thanks for repling

Traceback (most recent call last): File "train.py", line 136, in train(model, train_iter, optimizer, criterion) File "train.py", line 19, in train for i, batch in enumerate(iterator): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/drive/My Drive/bert-event-extraction-master/data_load.py", line 93, in getitem tokens = tokenizer.tokenize(w) if w not in [CLS, SEP] else [w] File "/usr/local/lib/python3.7/dist-packages/pytorch_pretrained_bert/tokenization.py", line 110, in tokenize for token in self.basic_tokenizer.tokenize(text): File "/usr/local/lib/python3.7/dist-packages/pytorch_pretrained_bert/tokenization.py", line 217, in tokenize text = self._clean_text(text) File "/usr/local/lib/python3.7/dist-packages/pytorch_pretrained_bert/tokenization.py", line 308, in _clean_text cp = ord(char) TypeError: ord() expected a character, but string of length 5 found

alwayslikethat avatar Nov 10 '21 08:11 alwayslikethat

Do you have a problem with keyerror when running the code? p = [postag2idx[postag] for postag in p] The 95th line of code in data_load.py

I have removed all variables for entities and postag because it is not used in this model

I have removed the pos and entities ,an error occured like this:,thanks for your reply

Traceback (most recent call last): File "train.py", line 136, in train(model, train_iter, optimizer, criterion) File "train.py", line 19, in train for i, batch in enumerate(iterator): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/drive/My Drive/bert-event-extraction-master/data_load.py", line 93, in getitem tokens = tokenizer.tokenize(w) if w not in [CLS, SEP] else [w] File "/usr/local/lib/python3.7/dist-packages/pytorch_pretrained_bert/tokenization.py", line 110, in tokenize for token in self.basic_tokenizer.tokenize(text): File "/usr/local/lib/python3.7/dist-packages/pytorch_pretrained_bert/tokenization.py", line 217, in tokenize text = self._clean_text(text) File "/usr/local/lib/python3.7/dist-packages/pytorch_pretrained_bert/tokenization.py", line 308, in _clean_text cp = ord(char) TypeError: ord() expected a character, but string of length 5 found

alwayslikethat avatar Nov 10 '21 14:11 alwayslikethat