ELMoForManyLangs
ELMoForManyLangs copied to clipboard
Convert csv to conllu format
Hi everyone, I need help. So I'm doing my research about contextual topic model for my thesis and I'm about to try using ELMo for Indonesian. Unfortunately, my data was tweet and it was saved in csv format while this pre-trained is required conllu format. My question is how to convert from csv to conllu format? So far, I found the way convert to from string to conllu but not document yet. Is there any advice for this? Thank you in advance guys.
I suggest you look at this repo https://github.com/EMBEDDIA/supar-elmo#Usage They have a util function to achieve what you want in an easy way:
from supar.utils import CoNLL
print(CoNLL.toconll(['She', 'enjoys', 'playing', 'tennis', '.']))
1 She _ _ _ _ _ _ _ _
2 enjoys _ _ _ _ _ _ _ _
3 playing _ _ _ _ _ _ _ _
4 tennis _ _ _ _ _ _ _ _
5 . _ _ _ _ _ _ _ _
Just read in your csv and tokenize your text.