yomichan icon indicating copy to clipboard operation
yomichan copied to clipboard

{part-of-speech} returns "Unknown" for nouns and na-adjectives

Open BernhardValenti opened this issue 2 years ago • 3 comments

Description trying to add {part-of-speech} to my cards, but the template returns Unknown for nouns and na-adjectives. i-adjectives and verbs seem to return the proper values.

eg: and 元気

dictionary is JMdict, and in the yomichan browser it does show the n for noun, and adj-na for na-adjectives.

Browser version chrome 92

Yomichan version 21.7.31.2

BernhardValenti avatar Sep 02 '21 18:09 BernhardValenti

This is because the JMdict dictionary data doesn't list the part-of-speech for nouns. It lists n and adj-na as tags, but this is not the same as part-of-speech.

toasted-nutbread avatar Sep 04 '21 16:09 toasted-nutbread

updated my templates to:

{{#*inline "part-of-speech-pretty"}}
    {{~#if (op "===" . "v1")~}}Ichidan verb
    {{~else if (op "===" . "v5")~}}Godan verb
    {{~else if (op "===" . "vk")~}}Kuru verb
    {{~else if (op "===" . "vs")~}}Suru verb
    {{~else if (op "===" . "vz")~}}Zuru verb
    {{~else if (op "===" . "adj-i")~}}I-adjective
    {{~else if (op "===" . "adj-na")~}}Na-adjective
    {{~else if (op "===" . "n")~}}Noun
    {{~else~}}{{.}}
    {{~/if~}}
{{/inline}}

{{#*inline "part-of-speech"}}
    {{~#scope~}}
        {{~#if (op "!==" definition.type "kanji")~}}
            {{~#set "first" true}}{{/set~}}
            {{~#each definition.expressions~}}
                {{~#each wordClasses~}}
                    {{~#unless (get (concat "used_" .))~}}
                        {{~> part-of-speech-pretty . ~}}
                        {{~#unless (get "first")}}, {{/unless~}}
                        {{~#set (concat "used_" .) true~}}{{~/set~}}
                        {{~#set "first" false~}}{{~/set~}}
                    {{~/unless~}}
                {{~/each~}}
            {{~/each~}}
            {{~#if (get "first")~}}
                {{#each definition.definitionTags}}
                {{~#unless (get (concat "used_" .))~}}
                    {{~> part-of-speech-pretty name ~}}
                    {{~#unless (get "first")}}, {{/unless~}}
                    {{~#set (concat "used_" .) true~}}{{~/set~}}
                    {{~#set "first" false~}}{{~/set~}}
                {{~/unless~}}
            {{/each}}
            {{~/if~}}
            {{~#if (get "first")~}}Unknown{{~/if~}}
        {{~/if~}}
    {{~/scope~}}
{{/inline}}

BernhardValenti avatar Sep 09 '21 08:09 BernhardValenti

This is because the JMdict dictionary data doesn't list the part-of-speech for nouns. It lists n and adj-na as tags, but this is not the same as part-of-speech.

Just to be clear, the proper JMdict XML source data does distinguish part-of-speech tags from all other miscellaneous tags.

In the JMdict dictionary file produced for Yomichan by yomichan-import, the part-of-speech field only contains a limited and modified subset of those tags. From what I understand, these values are used behind-the-scenes for de-conjugating words into their dictionary forms so that they may be queried by yomichan. Part-of-speech tags that are not used for de-conjugation are not added to this part-of-speech list. However, all of the part-of-speech information is still added properly to the definition tags of each term; they're just mixed in with all the other miscellaneous and usage-domain tags that are displayed to the user in the glossary.

Based on my understanding of how this works, I don't think this {part-of-speech} handlebar should even exist. All of this information already exists in a complete form within the {glossary} field. The part-of-speech of a given word can also vary depending on the sense in which it is used. For example, 亜 can be a prefix or a noun. The {part-of-speech} handlebar could be updated to return a list of all possible parts-of-speech for an expression, but why? That information isn't as useful and could even be confusing without the corresponding sense context (which exists in the glossary).

The part-of-speech tags for JMdict entries that are displayed in yomichan are each contained within a <span> node with a data-category="partOfSpeech" attribute, so maybe something could be done with that if a user needs a way to extract or query the data.

stephenmk avatar Apr 24 '22 17:04 stephenmk