ve
ve copied to clipboard
Parsing issues
- Ruby 2.1.5
- ve 0.0.3
Case 1
Actual:
string = 'おつまみ'
words = Ve.in(:ja).words(string).map(&:word)
=> ["お", "つまみ"]
parts_of_speeches = Ve.in(:ja).words(string).map(&:part_of_speech)
=> [Ve::PartOfSpeech::Prefix, Ve::PartOfSpeech::Verb]
Expected:
words = Ve.in(:ja).words(string).map(&:word)
=> ["おつまみ"]
parts_of_speeches = Ve.in(:ja).words(string).map(&:part_of_speech)
=> [Ve::PartOfSpeech::Noun]
Case 2
Actual:
string = 'やぐら'
words = Ve.in(:ja).words(string).map(&:word)
=> ["や", "ぐら"]
parts_of_speeches = Ve.in(:ja).words(string).map(&:part_of_speech)
=> [Ve::PartOfSpeech::Postposition, Ve::PartOfSpeech::Noun]
Expected:
words = Ve.in(:ja).words(string).map(&:word)
=> ["やぐら"]
parts_of_speeches = Ve.in(:ja).words(string).map(&:part_of_speech)
=> [Ve::PartOfSpeech::Noun]
Case 3
Actual:
string = '煮っころがし'
words = Ve.in(:ja).words(string).map(&:word)
=> ["煮っ", "ころ", "が", "し"]
parts_of_speeches = Ve.in(:ja).words(string).map(&:part_of_speech)
=> [Ve::PartOfSpeech::Verb, Ve::PartOfSpeech::Noun, Ve::PartOfSpeech::Postposition, Ve::PartOfSpeech::Verb]
Expected:
words = Ve.in(:ja).words(string).map(&:word)
=> ["煮っころがし"]
parts_of_speeches = Ve.in(:ja).words(string).map(&:part_of_speech)
=> [Ve::PartOfSpeech::Noun]
Ah, the ambiguities of language! :)
All breakdowns here, both the actual and the expected are ok parsings of these sentences.
Personally I prefer the prefix お to be parsed as a separate word. But you could either write some post processing logic to combine prefix-お with the following word.
For やぐら and 煮っころがし you could add them as words to a custom dictionary like I explained in #22. But there is no guarantee that mecab will parse them correctly even so, it depends on cost values.