"cz" (vs ISO langauge code "cs") for Czech analysis package? [LUCENE-6366]
As noted by Eduard Moraru on the solr-user mailing list, the sample fieldtypes Solr provides for dealing with the Czech use "cz" as a fieldType, dynamicField, and stopwords file naming convention – but "cz" isn't the language code for Czech – the correct langauge code is "cs".
Solr's naming convention here comes directly from the lucene analysis package name for the Czech analysis classes: org.apache.lucene.analysis.cz; so before making any changes in the Solr sample configs (SOLR-7267), we should probably clarify if/why the lucene package name is like this.
Migrated from LUCENE-6366 by Chris M. Hostetter (@hossman), 1 vote, updated Mar 19 2015 Linked issues:
Robert Muir (@rmuir) (migrated from JIRA)
some of these older ones just dont follow any ISO system.
chinese stuff is also under .cn and brazilian portuguese is under .br
Chris M. Hostetter (@hossman) (migrated from JIRA)
Should we fix/move these to match the country code and deprecate the old packages & classes?
Robert Muir (@rmuir) (migrated from JIRA)
Its not exactly obvious what they should be. keep in mind some languages don't have ISO-639-1 or ISO-639-* at all (e.g. brazilian portuguese, sorani kurdish) so adhering to that will just not work. Language tags (e.g. pt-BR) mean packages would have to have underscores, hyphen is not even allowed in the package name.
Chris M. Hostetter (@hossman) (migrated from JIRA)
Its not exactly obvious what they should be. keep in mind some languages don't have ISO-639-1 or ISO-639-* at all (e.g. brazilian portuguese, sorani kurdish) so adhering to that will just not work.
So let me rephrase/correct my question(s):
-
For languages which do have an ISO-639-1 code, should we "fix" the existing java package names?
-
For languages which do not have an ISO-639-1 code, should we adopt & document some sort of specific rule for how we namespace these sorts of things?
Robert Muir (@rmuir) (migrated from JIRA)
I don't think thats any less confusing than just having no system at all like today. Either they are consistent or they are not, and you just can't make assumptions about what the code means.
Uwe Schindler (@uschindler) (migrated from JIRA)
I don't think it is a good idea to rename packages or classes, just because they are not consistent.
Is there any update on the issue?