udar icon indicating copy to clipboard operation
udar copied to clipboard

add alternative output formats

Open reynoldsnlp opened this issue 6 years ago • 2 comments

This may not be possible in every case, but where possible, add other common output formats:

  • connl(x/u)
  • mystem
  • Multext-East (Sharoff, et al.)
  • etc?

reynoldsnlp avatar Jun 23 '19 12:06 reynoldsnlp

As for connl-u format, there does not appear to be any way to represent ambiguity, so the conversion would be lossy.

reynoldsnlp avatar Sep 21 '19 03:09 reynoldsnlp

mystem can have ambiguous readings separated by | in its output, even with the -d (disambiguate) flag:

$ echo "Мы уже работаем здесь три недели." | mystem3.1 -ind
Мы{мы=SPRO,мн,1-л=им}
уже{уже=ADV=}
работаем{работать=V,несов,нп=непрош,мн,изъяв,1-л}
здесь{здесь=ADVPRO=}
три{три=NUM=им|три=NUM=вин,неод}
недели{неделя=S,жен,неод=вин,мн|неделя=S,жен,неод=род,ед|неделя=S,жен,неод=им,мн}

reynoldsnlp avatar Sep 21 '19 21:09 reynoldsnlp