UMT icon indicating copy to clipboard operation
UMT copied to clipboard

Issues related to data processing

Open JinFish opened this issue 4 years ago • 4 comments

Hey, I downloaded the data set through the link provided by you, but the form is very different from the form under the data directory in your Github. I looked at the code in the run_mtmner_crf.py file, but I could not find the method of data processing. I hope you can tell me how to convert the downloaded data into the data form under your data directory, thank you very much.

JinFish avatar Oct 28 '20 12:10 JinFish

Hi there,

Yes, the dataset set provided through the link is constructed for another sentiment analysis task (https://github.com/jefferyYu/TomBERT), which is quite different from the MNER task here. Note that the provided link is only used for downloading the images of the multimodal tweets in our two MNER datasets.

The data processing part for these two MNER datasets is provided in the function "mmreadfile(filename):" (line 145-193) of the run_mtmner_crf.py file.

Hope it clarifies your concern. Please let me know if you have any other questions.

Best, Jianfei

jefferyYu avatar Oct 28 '20 23:10 jefferyYu

Therefore, the data set under the data directory in your GitHub is the real text data set and is complete, right?

JinFish avatar Oct 29 '20 04:10 JinFish

Yep.

jefferyYu avatar Oct 29 '20 06:10 jefferyYu

Thank you for your patience. If I have any other questions, I will contact you again.

JinFish avatar Oct 29 '20 06:10 JinFish