Clarify the status of Kuromoji dictionaries
Description
While refactoring the gradle code/data generation code I stumbled across the fact that we currently have two different tasks for generating the same set of output files - one is compileMecab, the other is compileNaist. They use different inputs but write to the same output files.
There is also this patch, which seems to be hanging or abandoned - https://github.com/apache/lucene/pull/12517/files
I don't have any experience with Kuromoji... is there any reason to keep both inputs? Should it be configurable at runtime somehow?
At the moment, to get naist dictionary, you need to generate it by hand and recompile Lucene.
I think your description is correct. There are size implications with some of these dictionaries as well, they can be enormous.