lindera icon indicating copy to clipboard operation
lindera copied to clipboard

"japanese_compound_word" doesn't work properly without built-in dict

Open Jimmy-Z opened this issue 1 year ago • 1 comments

It's possible to use lindera without any dictionary feature and load a external dictionary instead, however, in this case, "japanese_compound_word" will produce compound words with empty details, since the code to handle this is (needlessly) guarded behind those features: https://github.com/lindera/lindera/blob/main/lindera/src/token_filter/japanese_compound_word.rs#L96-L199

further down, "japanese_stop_tags" would assume any detail with length < 4 to be 1 https://github.com/lindera/lindera/blob/main/lindera/src/token_filter/japanese_stop_tags.rs#L102-L105 this would then trigger a index out of range panic.

Jimmy-Z avatar Dec 07 '24 22:12 Jimmy-Z

@Jimmy-Z Thanks for the report. I will check it later.

mosuka avatar Dec 29 '24 14:12 mosuka