VLP-MABSA
VLP-MABSA copied to clipboard
Questions about how to process the original dataset
Nice work and nice repository ! But I still have some doubts about the repository~
- Could you please provide some more instructions about how to process the original MVSA dataset using the tools you mentioned? For example, what steps you have taken using the twitter_nlp to perform NER and how did you use sentiwordnet to matching the opinion words, finally what results we got after above processing. Same thing with Faster-RCNN and ANPs extractor.
- Could you please provide some sample data of the processed MVSA? It will be great if you could provide some example data into BaiduNetdisk, because I still have no idea about the exact data format with only MVSA_descriptions.txt provided, thus could not reproduce the pretraining part of your code.
Thanks a lot!
Thank you for your questions, i have added some details of processing the pre-training dataset in the README.md. I hope this could help you understand the pre-processing.
Thank you for your questions, i have added some details of processing the pre-training dataset in the README.md. I hope this could help you understand the pre-processing.
Thanks for your excellent work and patient feedback. Could you please release the processed pre-training data for better reproductivity?
请问对使用twitter_nlp工具没有抽出实体(方面术语)的样例,是删除了还是做了另外的处理?
请问最终预训练的数据量大概是多少
1.对于twitter_nlp工具没有抽出实体我们在预训练当中是作空处理的,因为下游任务上也有不存在实体的情况。 2.预训练的数据量大概是17000多。