engram-es
engram-es copied to clipboard
English and Spanish optimized?
I’m wondering for folks who write English and Spanish if there is common version that they can use.
No, at least not that I know of. But this shouldn't be too hard to optimize.
English character frequency is well documented. A big Spanish corpus was cleaned up and analyzed by Ian Doug with my help, see details here: https://github.com/binarybottle/engram-es/discussions/21
So by weighing the frequencies of characters, bigrams and trigrams you could use Arno's code to optimize for example a keyboard for 50/50 english/spanish, or some other ratio.
Is there a Spanish letter frequency similar to the English "Letter frequencies (Norvig, 2012)"?
Is there a Spanish letter frequency similar to the English "Letter frequencies (Norvig, 2012)"?
Derived from https://zenodo.org/record/5501931
I’m wondering for folks who write English and Spanish if there is common version that they can use.
Hi, masters3d, Could you succeed?. We are many that have to write both in english and Spanish . Your idea would be the perfect balance
Edit: moved this to a dedicated issue
An alternative solution: excerpt from my original comment
Hello, there! I'm a user of the (programmer's) Dvorak layout for almost a decade now, and it was a huge improvement over good ol' QWERTY to learn it. However, while it is really widespread and readily available on most current systems, its performance for the English language is sub-optimal. Also, its variations for languages with similar alphabets —like my dear Portuguese— are still "super-terrible" (a bit less terrible than QWERTY due to the vowels at the left home row).
A "Latin" or "Romance-Germanic" base keyboard layout
For whoever is interested, I propose the development of a base layout using the Latin alphabet that is optimized for all of these 5 languages [English, Spanish, French, Portuguese and German]. It wouldn't be a simple weighted optimization though. What I would expect to achieve with this design is:
- To have a common base for creating a new layout for each of the 5 languages;
- It must be really good at English, at least as good as other current designs by the same metrics;
- It should be reasonably good for the other 4 languages, but must not be terrible for any of them;
- The differences between the layouts should be minimal, so that one can constantly switch between layouts without hassle, create a custom hybrid bilingual layout or don't even need to switch at all.
Full original comment
Hello, there! I'm a user of the (programmer's) Dvorak layout for almost a decade now, and it was a huge improvement over good ol' QWERTY to learn it. However, while it is really widespread and readily available on most current systems, its performance for the English language is sub-optimal. Also, its variations for languages with similar alphabets —like my dear Portuguese— are still "super-terrible" (a bit less terrible than QWERTY due to the vowels at the left home row).
The elephant in the room
I took a look at some of these newer designs, including yours. Congratulations, by the way! Amazing work. But the OP touched a very important point that is still unaddressed by all of these: we live in an international, interconnected world now. Until the early 2000's, it wasn't a problem to have totally different keyboard layouts for every language. We even used different, incompatible text encodings! But now the most used encoding in both new devices and the Internet is Unicode. I believe the same transition should happen to keyboard layouts.
But is there a need for it? Well, most professionals that type a lot (journalists, academics, programmers, etc.) will need to either create content in more than one language, usually in their native one and in English, or at least communicate with foreigners through text often. It applies even to countries that have English as their primary language, like the US, where there's more and more people speaking Spanish as a primary or secondary language each year (> 50 million today).
Is an "international" keyboard layout possible?
I know that many languages use completely different alphabets and, even when they use similar ones (like variations of the Latin or Cyrillic scripts), they have extra characters and wildly varying letter/n-gram frequencies. Therefore, there can't be a truly international base layout for keyboards. But can we do better?
Starting from English, the de facto international language, a non-monolingual layout can't be much distant from ASCII. Looking at the languages with most speakers in the world that use a Latin script alphabet, we have in the top positions (Wikipedia/Ethnologue 2022):
| Position | Language | Family | Branch | 1st language | 2nd language | Total speakers |
|---|---|---|---|---|---|---|
| 1 | English | Indo-European | Germanic | 372.9 million | 1.080 billion | 1.452 billion |
| 4 | Spanish | Indo-European | Romance | 474.7 million | 73.6 million | 548.3 million |
| 5 | French | Indo-European | Romance | 79.9 million | 194.2 million | 274.1 million |
| 9 | Portuguese | Indo-European | Romance | 232.4 million | 25.2 million | 257.7 million |
| 12 | German | Indo-European | Germanic | 75.6 million | 59.1 million | 134.6 million |
I think it would be feasible to analyze these 5 languages, from two branches of the same language family —you already did it for two, and find a design that isn't awesome for one of them but sucks for all the others...
A "Latin" or "Romance-Germanic" base keyboard layout
For whoever is interested, I propose the development of a base layout using the Latin alphabet that is optimized for all of these 5 languages. It wouldn't be a simple weighted optimization though. What I would expect to achieve with this design is:
- To have a common base for creating a new layout for each of the 5 languages;
- It must be really good at English, at least as good as other current designs by the same metrics;
- It should be reasonably good for the other 4 languages, but must not be terrible for any of them;
- The differences between the layouts should be minimal, so that one can constantly switch between layouts without hassle, create a custom hybrid bilingual layout or don't even need to switch at all.
Steps necessary to achieve these goals:
- Obtain a text corpus and n-gram frequency for French, German and Portuguese;
- Find the similarities between the 5 languages using some kind of distance measure(s);
- Define optimization weights for them considering these similarities, number of speakers, etc.;
- Develop a method for searching the layout space by optimizing primarily for English and secondarily for the 4 other languages (using the weights), with penalization if the layout starts becoming too bad for any single language middle search;
- Choose a winner base layout and then search for full layouts for each individual language, positioning specific keys and maybe repositioning some punctuation keys in the process.
- Profit. 😎
Advantages
- Beyond the obvious advantages for multilingual typists, this base layout and its derivatives would benefit from having a unified, larger user base —likely very small in the beginning, but it's plausible to reach a critical size eventually.
- Its software implementations could have common codebases, following the pattern of a base layout file (either the English layout or just the base itself) and modifications of it. Would be easier to maintain and port to different systems.
- Being multilingual could be an eye-catching feature for anyone looking for a better layout to learn beyond QWERTY/Dvorak.
- The new methods developed could be useful for custom/personal layout creation and also for other language subfamilies, like those that use the Cyrillic script.
I'm seriously considering to learn once more a new keyboard layout, but it would have to be a killer layout. It would have to be one to rule them all.
I am willing to dedicate some time to this idea if there are others interested. If not, maybe I'll end up trying to create my own Portuguese or Portuguese-English Engram layout.
Greetings from Brazil! 🇧🇷
I think that this issue could be well addressed by an English-Spanish-French key layout -- see: https://github.com/binarybottle/engram/issues/58#issue-1688602061