VLP-MABSA Questions about how to process the original dataset

Questions about how to process the original dataset

Open jc-ryan opened this issue 2 years ago • 5 comments

Nice work and nice repository ! But I still have some doubts about the repository~

Could you please provide some more instructions about how to process the original MVSA dataset using the tools you mentioned? For example, what steps you have taken using the twitter_nlp to perform NER and how did you use sentiwordnet to matching the opinion words, finally what results we got after above processing. Same thing with Faster-RCNN and ANPs extractor.
Could you please provide some sample data of the processed MVSA? It will be great if you could provide some example data into BaiduNetdisk, because I still have no idea about the exact data format with only MVSA_descriptions.txt provided, thus could not reproduce the pretraining part of your code.

Thanks a lot!

Apr 22 '22 09:04 jc-ryan

Thank you for your questions, i have added some details of processing the pre-training dataset in the README.md. I hope this could help you understand the pre-processing.

Apr 22 '22 13:04 lyhuohuo

Thank you for your questions, i have added some details of processing the pre-training dataset in the README.md. I hope this could help you understand the pre-processing.

Thanks for your excellent work and patient feedback. Could you please release the processed pre-training data for better reproductivity?

Sep 22 '22 10:09 PKUCSS

请问对使用twitter_nlp工具没有抽出实体（方面术语）的样例，是删除了还是做了另外的处理？

Sep 27 '22 11:09 SilyRab

请问最终预训练的数据量大概是多少

Sep 27 '22 11:09 SilyRab

1.对于twitter_nlp工具没有抽出实体我们在预训练当中是作空处理的，因为下游任务上也有不存在实体的情况。 2.预训练的数据量大概是17000多。

Sep 28 '22 01:09 lyhuohuo

VLP-MABSA VLP-MABSA copied to clipboard

Questions about how to process the original dataset

VLP-MABSA
VLP-MABSA copied to clipboard