lexica
lexica copied to clipboard
Suggestions for Italian dictionary
Hi, I am playing Lexica in Italian. Often some non-Italian letters or accented Italian letter appear, reducing the possibility to play the game, as there aren't so much Italian words containing these words. Often, only one words contains these letters. Moreover accents appear only at the end of some words.
So I suggest to remove English-only letters (j, k ,w, x, y), accented letters (à, è, ì, ò, ù) and some uncommon letters (h, q) from the possibility to appear in the table, in order to make the game better. Accented letter could be comverted in theis basic letter (a, e, i, o, u)
From a comment on Google Play (seemed relevant to this):
Very fun and interesting, but the Italian has some issues, it reports a lot of words that don't exist a d seems to skip a few existing ones Also the accents aren't used enough in Italian to make sense in the game, they just become dead cells. Great design for the multiplayer!
I've done a quick count of how many times each letter is represented in the Italian dictionary, and came up with this:
for CHAR in $(./show-chars-in-dict.sh it); do COUNT=$(grep $CHAR assets/dictionaries/dictionary.it.txt | wc -l) && echo "$CHAR: $COUNT"; done
a: 192407
b: 29064
c: 77695
d: 46509
e: 140688
é: 552
è: 22
f: 26043
g: 41157
h: 6887
i: 183650
ì: 623
j: 40
k: 297
l: 100429
m: 60781
n: 92088
o: 121930
ò: 7971
p: 45093
q: 1391
r: 132735
s: 96981
t: 120556
u: 46735
ù: 42
v: 45903
w: 76
x: 2028 (But only 437 if I exclude roman numerals using the regex `^[lxivcdm]+$`)
y: 128
z: 12217
Limiting this to just those mentioned by @airon90 above, we see the following:
j: 40
k: 297
w: 76
x: 2028 (But only 437 if I exclude roman numerals using the regex `^[lxivcdm]+$`)
y: 128
à: 3849
é: 552
è: 22
ì: 623
ò: 7971
ù: 42
Again, stressing that I am not an Italian speaker, but given that ò
and à
appear in so many words, perhaps we should do as recommended above and normalize them to o
and a
respectively.
With all that said, here is a proposal with some questions. If I'm able to get some confirmation from speakers of the Italian language, then I'd be happy to action them:
- Adjust the current dictionary, resulting in only one Italian dictionary (no need for "Italian" + "Italian (extended - including à, etc)").
- Remove all words containing the letters
j, k, w, x, y
. - Leave the uncommon letters
h
andq
(as with English and other languages, they will be assigned a low probability, and thus appear less frequently in boards anyway).
The only question I have is what to do about the diacritics. Some proposals (from a naive English speakers perspective - please correct any misunderstanding I may have):
-
Normalize all diacritics (convert
ò -> o
,à -> a
,é + è -> e
,ì -> i
,ù -> u
) as they are legitimate letters in the Italian dictionary, and players will be able to understand that, e.g. the wordcaffe
in Lexica is actually referring to the wordcaffè
in the Italian language. - Remove all words containing diacritics (e.g. if they are indicative of loan words that players would understand are not needed for a game such as Lexica).
-
Remove some and normalize other diacritics. This would need input from native speakers as to: Which diacritics only ever appear in loanwords (e.g.
ù
is only found in 42 words in this dictionary) vs others which are used in Italian words (e.g. ò which appears in 7971 words in this dictionary).
Leave the uncommon letters h and q (as with English and other languages, they will be assigned a low probability, and thus appear less frequently in boards anyway).
If so, make sure that:
- in Italian words "h" appears between "c" or "g" and "i" or "e", creating "chi", "che", "ghi", "ghe". Some imported words used in Italian may follow other rules (e.g. "hotel")
- "q" always follows "u" ("qu") as always appear together
Normalize all diacritics (convert ò -> o, à -> a, é + è -> e, ì -> i, ù -> u) as they are legitimate letters in the Italian dictionary, and players will be able to understand that, e.g. the word caffe in Lexica is actually referring to the word caffè in the Italian language.
+1. Words must appear with correct accent in the lower part of the screen and in the final part of the game
I don't find any help/tutorial but this information must be public.
Closing this as a recent PR #362 addresses this somewhat. When going through old issues, I was attempting to fix #337 which is technically a dupe of this, but I found that one first.
If we wish to have a no-diacritics version, we can open a new issue or track it in #337.