Nihongo
Nihongo copied to clipboard
Swift-y Japanese language morphological analysis powered by online services.
Nihongo 🇯🇵
Nihongo is API powered Japanese language morphological analysis Swift library for iOS. It helps you quickly tokenize Japanese sentences and understand its anatomy.
Motivation: NSLinguisticTagger
is fantastic, but before it supports Japanese morphological analysis, this is your easiest option for now.
Nihongo is powered by Yahoo Japan's morphological analysis API. Support of multiple—online and offline—service would be nice, but this is only option for now.
⚠️ This is work in progress repo. I'll start accepting PRs as soon as I have this repo all tidy. By that time proper docs will be in place too. Here is how my todo list looks like now:
- [x] Implement framework so its usable
- [ ] Add error handling
- [ ] Add tests and test cases
- [ ] Consider adding sample iOS/iPhone app
- [ ] Proper in-line documentation
- [ ] Clear documentation in README (you are looking at it now)
- [ ] Add Cococapods support
- [ ] Add Carthage support
- [ ] watchOS + tvOS support
- [ ] macOS support (perhaps?)
Thanks for checking-out Nihongo!
Usage
Deconstructing sentences is trivial. Just instantiate Nihongo
with valid API app_id
and call analize(sentence:completion:)
. You'll get array of Word
types in completion handler as a closure parameter.
let sentence = "昼ごはんを食べています。"
let ma = Nihongo(yahooJapanAppId: "<YAHOO_CO_JP_APP_ID>")
ma.analyze(sentence: sentence) {
words in
for token in words {
print("Word \(token.text) reads as \(token.reading) is \(token.class) and has base form of \(token.base)")
}
}
Code above will yield following in console:
Word 昼ごはん reads as ひるごはん is noun and has base form of 昼ごはん
Word を reads as を is particle and has base form of を
Word 食べ reads as たべ is verb and has base form of 食べる
Word て reads as て is particle and has base form of て
Word い reads as い is auxiliaryVerb and has base form of いる
Word ます reads as ます is auxiliaryVerb and has base form of ます
Word 。 reads as 。 is special and has base form of 。
Extracted Token — Word
Word
is inert/immutable struct that describes extracted token.
-
text
—String
that matches exact text from analyzed sentence. -
reading
—String
in hiragana syllabary describing how to readtext
. -
class
— InternalWordClass
enum describing what morphological part of sentance is the word. -
base
—String
with base form of the word (e.g. unconjugated verb). Usefull of further dictionary look-ups.
Morphological Classes of Words — WordClass
WordClass
is lexical classification of Word
. Supported classes are:
-
.adjectivalVerb
— Adjectival Verb (形容詞, keiyōshi) a.k.a adjectival verb, i-adjective, adjective, stative verb -
.adjectivalNoun
— Adjectival Noun (形容動詞, keiyōdōshi) a.k.a. adjectival noun, na-adjective, copular noun, quasi-adjective, nominal adjective, adjectival verb -
.attributive
— Prenominal (連体詞, rentaishi) a.k.a. attributive, true adjective, prenominal, pre-noun adjectival -
.interjection
— Interjection (感動詞, kandōshi) -
.adverb
— Adverb (副詞, fukushi) -
.conjunction
— Conjunction (接続詞, setsuzokushi) -
.noun
— Noun (名詞, meishi) -
.verb
— Verb (動詞, dōshi) -
.particle
— Particle (助詞, joshi) -
.auxiliaryVerb
— (助動詞, jodōshi) -
.prefix
— Prefix (接頭辞) -
.suffix
— Suffix (接尾辞) -
.special
— Punctuation, brackets, symbols, etc.