ELMoForManyLangs icon indicating copy to clipboard operation
ELMoForManyLangs copied to clipboard

Convert csv to conllu format

Open mastifatchiya opened this issue 2 years ago • 1 comments

Hi everyone, I need help. So I'm doing my research about contextual topic model for my thesis and I'm about to try using ELMo for Indonesian. Unfortunately, my data was tweet and it was saved in csv format while this pre-trained is required conllu format. My question is how to convert from csv to conllu format? So far, I found the way convert to from string to conllu but not document yet. Is there any advice for this? Thank you in advance guys.

mastifatchiya avatar Nov 21 '22 07:11 mastifatchiya

I suggest you look at this repo https://github.com/EMBEDDIA/supar-elmo#Usage They have a util function to achieve what you want in an easy way:

from supar.utils import CoNLL
print(CoNLL.toconll(['She', 'enjoys', 'playing', 'tennis', '.']))
1       She     _       _       _       _       _       _       _       _
2       enjoys  _       _       _       _       _       _       _       _
3       playing _       _       _       _       _       _       _       _
4       tennis  _       _       _       _       _       _       _       _
5       .       _       _       _       _       _       _       _       _

Just read in your csv and tokenize your text.

melanchthon19 avatar Jun 26 '23 20:06 melanchthon19