pytorch_graph-rel icon indicating copy to clipboard operation
pytorch_graph-rel copied to clipboard

Original Dataset

Open yin-hong opened this issue 5 years ago • 10 comments

Hello! Can you share nyt and webnlg original dataset containing train, dev, test ? Thanks a lot !

yin-hong avatar Sep 28 '19 03:09 yin-hong

Hi Michael, I get the dataset from here.

tsujuifu avatar Sep 28 '19 23:09 tsujuifu

Hi Michael, I get the dataset from here.

Thanks for your reply! I have downloaded this dataset. However, I find the entity type is not annotated in webnlg dataset. How do you solve this problem?

yin-hong avatar Sep 29 '19 02:09 yin-hong

Hi, Michael.

For the original WebNLG dataset, there is no entity type tag. (But for NYT, there should be.) And for the joint extraction of entity and relation task, we only care about the relation type and the positions of two entities, hence we don't need the tag of the entity type.

Sincerely, Tsu-Jui

michael-hon [email protected] 於 2019年9月28日 週六 下午7:26寫道:

Hi Michael, I get the dataset from here https://github.com/xiangrongzeng/copy_re.

Thanks for your reply! I have downloaded this dataset. However, I find the entity type is not annotated in webnlg dataset. How do you solve this problem?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tsujuifu/pytorch_graph-rel/issues/7?email_source=notifications&email_token=AJKWMAUTAMID3AWBX25C2PTQMAG43A5CNFSM4I3MRMB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73GK3Q#issuecomment-536241518, or mute the thread https://github.com/notifications/unsubscribe-auth/AJKWMAUUWBK3OLWMNDHLSHTQMAG43ANCNFSM4I3MRMBQ .

tsujuifu avatar Sep 29 '19 02:09 tsujuifu

Hi, Michael. For the original WebNLG dataset, there is no entity type tag. (But for NYT, there should be.) And for the joint extraction of entity and relation task, we only care about the relation type and the positions of two entities, hence we don't need the tag of the entity type. Sincerely, Tsu-Jui michael-hon [email protected] 於 2019年9月28日 週六 下午7:26寫道: Hi Michael, I get the dataset from here https://github.com/xiangrongzeng/copy_re. Thanks for your reply! I have downloaded this dataset. However, I find the entity type is not annotated in webnlg dataset. How do you solve this problem? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AJKWMAUTAMID3AWBX25C2PTQMAG43A5CNFSM4I3MRMB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73GK3Q#issuecomment-536241518>, or mute the thread https://github.com/notifications/unsubscribe-auth/AJKWMAUUWBK3OLWMNDHLSHTQMAG43ANCNFSM4I3MRMBQ .

Therefore, the loss function doesn't contain entity loss but only contain relation loss ?

yin-hong avatar Sep 29 '19 02:09 yin-hong

Noop, it contains both entity and relation loss.

While for entity, I only care that a word belongs to (B, I, E, S, O). B: begin word of an entity I: inner word of an entity E: end word of an entity S: this word is a single-word entity O: this word does not belong to entity

Hence, the entity loss is from 5-class classification.

michael-hon [email protected] 於 2019年9月28日 週六 下午7:46 寫道:

Hi, Michael. For the original WebNLG dataset, there is no entity type tag. (But for NYT, there should be.) And for the joint extraction of entity and relation task, we only care about the relation type and the positions of two entities, hence we don't need the tag of the entity type. Sincerely, Tsu-Jui michael-hon [email protected] 於 2019年9月28日 週六 下午7:26寫道: … <#m_-4239379777234311174_> Hi Michael, I get the dataset from here https://github.com/xiangrongzeng/copy_re. Thanks for your reply! I have downloaded this dataset. However, I find the entity type is not annotated in webnlg dataset. How do you solve this problem? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7 https://github.com/tsujuifu/pytorch_graph-rel/issues/7?email_source=notifications&email_token=AJKWMAUTAMID3AWBX25C2PTQMAG43A5CNFSM4I3MRMB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73GK3Q#issuecomment-536241518>, or mute the thread https://github.com/notifications/unsubscribe-auth/AJKWMAUUWBK3OLWMNDHLSHTQMAG43ANCNFSM4I3MRMBQ .

Therefore, the loss function doesn't contain entity loss but only contain relation loss ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tsujuifu/pytorch_graph-rel/issues/7?email_source=notifications&email_token=AJKWMAQULB42V2AECPB5O3LQMAJJTA5CNFSM4I3MRMB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73GT3I#issuecomment-536242669, or mute the thread https://github.com/notifications/unsubscribe-auth/AJKWMARMGGRXGQBM7NFJCH3QMAJJTANCNFSM4I3MRMBQ .

tsujuifu avatar Sep 29 '19 03:09 tsujuifu

Noop, it contains both entity and relation loss. While for entity, i only care that a word belongs to (B, I, E, S, O). B: begin word of an entity I: inner word of an entity E: end word of an entity S: this word is a single-word entity O: this word does not belong to entity Hence, the entity loss is from 5-class classification. michael-hon [email protected] 於 2019年9月28日 週六 下午7:46 寫道: Hi, Michael. For the original WebNLG dataset, there is no entity type tag. (But for NYT, there should be.) And for the joint extraction of entity and relation task, we only care about the relation type and the positions of two entities, hence we don't need the tag of the entity type. Sincerely, Tsu-Jui michael-hon @.*** 於 2019年9月28日 週六 下午7:26寫道: … <#m_-4239379777234311174_> Hi Michael, I get the dataset from here https://github.com/xiangrongzeng/copy_re. Thanks for your reply! I have downloaded this dataset. However, I find the entity type is not annotated in webnlg dataset. How do you solve this problem? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7 <#7>?email_source=notifications&email_token=AJKWMAUTAMID3AWBX25C2PTQMAG43A5CNFSM4I3MRMB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73GK3Q#issuecomment-536241518>, or mute the thread https://github.com/notifications/unsubscribe-auth/AJKWMAUUWBK3OLWMNDHLSHTQMAG43ANCNFSM4I3MRMBQ . Therefore, the loss function doesn't contain entity loss but only contain relation loss ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AJKWMAQULB42V2AECPB5O3LQMAJJTA5CNFSM4I3MRMB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73GT3I#issuecomment-536242669>, or mute the thread https://github.com/notifications/unsubscribe-auth/AJKWMARMGGRXGQBM7NFJCH3QMAJJTANCNFSM4I3MRMBQ .

Thanks for your reply ! I think I have fully understood your thought.

yin-hong avatar Sep 29 '19 03:09 yin-hong

hello,could you please tell me how to realize the dataset pre_tr?

zhihuatao avatar Nov 11 '19 03:11 zhihuatao

hello,could you please tell me how to realize the dataset pre_tr?

Hello,have get the input files? Thank you lot.

Wangyandong-master avatar Dec 05 '19 06:12 Wangyandong-master

Noop, it contains both entity and relation loss. While for entity, I only care that a word belongs to (B, I, E, S, O). B: begin word of an entity I: inner word of an entity E: end word of an entity S: this word is a single-word entity O: this word does not belong to entity Hence, the entity loss is from 5-class classification. michael-hon [email protected] 於 2019年9月28日 週六 下午7:46 寫道: Hi, Michael. For the original WebNLG dataset, there is no entity type tag. (But for NYT, there should be.) And for the joint extraction of entity and relation task, we only care about the relation type and the positions of two entities, hence we don't need the tag of the entity type. Sincerely, Tsu-Jui

@tsujuifu Thanks for the clarification. I'm trying to reproduce your excellent work but I have some trouble in the preparation of the dataset. I checked the preprocessed dataset released by CopyR [Zeng , 2018] and find the annotated entities are all single-word entities. In this case, should all the entity tags belong to 'B' when I prepare the training data for the Graph_rel model? Is there any plan to open the preprocessed dataset?

weizhepei avatar Dec 06 '19 14:12 weizhepei

Hi~, CopyR uses the version only annotating the last word, do you also follow this preprocessing setting? Or do you preprocessing on the original dataset released by CopyR and annotating the whole span? Thanks for your reply~

131250208 avatar Jun 08 '20 03:06 131250208