acat More Language Support?

Hi,

first thank you very much for sharing this application with us! I have a disabled cousin who cannot really move or speak. I would like to make this app usable for him.

As we're living in Germany, it would be great if this app worked for the german language as well. Is that possible already or planned for the future?

Best regards,

Marcel

Aug 22 '15 15:08 molerat619

Yes, that would be awesome. Or maybe there is possibility to use other tool than Presage which can support other languages? I would like to see polish language too, got sick person in my family.

Or we can run some open source project for that, too support other languages?

Aug 29 '15 09:08 ghost

Hello, my name is Bruno, I systems student information I a Brazilian university located in São Paulo, I'm trying to translate the ACAT software to make it available for people who need to use it in my country, the first version was almost stable, but as there was a recent software update, will have to download it to continue this work, I'd love to help everyone and also to keep in touch, I count on the help of developers, programmers and college friends, if interested in help, please keep in touch via skype, I leave here this contact skype: brunopcsilva, thank you all and I hope you have good news about seeking.

Hugs. Bruno Pellegrini

Oct 05 '15 18:10 B4DP0S31D0N

i had working acat 0.91 with a custom voice and predictive text in spanish, its a good start to a meetup in skype, add me: mariano.montanez @brunopcsilva

Oct 05 '15 19:10 nanomo

I am less informed about programing. How to translate this great peace of Software to the German language? Could you give me just a short explanaition or a link where I can learn it?

Dec 08 '15 17:12 InclusionProgress

@Orthopoint

I got an informations from Matteo Vescovi (he is awesome guy) from Presage how to do this: (still got a problem with language specific characters)

"Generating a language model is very easy, given a suitable collection of training text files (i.e. a training corpus).

The challenging step is to get a representative text corpus. This is mainly due to the fact that the best corpus you can select is made up by text that the user himself/herself produces. Presage can learn and adapt itself to the text the user inputs. So the best source of a training corpus is the text produced by the users themselves.

(Incidentally, that's why the current approach in presage is that the stock library comes with a minimal language model - trained on a single book. Applications using presage are encouraged to use their own language model, and use the default language model as a starting point by copying it into an application or user-specific model.)

The ACAT application in fact does exactly that: it provides a specially crafted English language model database - more details on this below.

The presage.xml configuration file, which is normally located in the %USERPROFILE%.presage\ directory on Windows, determines which language model is used.

The Predictor.SmoothedNgramPredictor.DBFILENAME config variable controls the language model database used by the Smoothed Ngram predictor, i.e.:

SmoothedNgramPredictor ERROR /path/to/your/new/language/database.db

You will want to modify this to point to your new language model.

You can generate a language model database using the text2ngram tool which comes with presage.

text2ngram tool generates n-gram language models from a given corpus of text. Ideally, you would collect a representative set of text (text that the user has produced or text that matches the writing style and context of the user) and then feed that to the text2ngram tool to generate a n-gram database (1-gram, 2-gram and 3-gram tables, but you can also generate higher order n-gram and set the SmoothedNgram DELTAS values accordingly).

Obviously, the higher n-gram order used, the higher the risk of overtraining the model and overfitting the training corpus (wikipedia has good articles about machine learning and overfitting).

BTW, the default English language model is a 3-gram model generated from the novel "The picture of Dorian Gray" by Oscar Wilde.

For example, running the following commands and editing your presage.xml should get you started with a German language model:

 wget http://www.gutenberg.org/cache/epub/16264/pg16264.txt
 for i in 1 2 3; do text2ngram -a -n $i -l -f sqlite -o database_de.db pg16264.txt; done

The wget command downloads a UTF-8 encoded German text file from Project Gutemberg and the following line invokes on text2ngram three times to generate the 3-gram language model database_de.db

For reference, I believe ACAT used the following script to generate the language model database shipped with the ACAT installer:

if [ ! -e text8 ]; then

wget http://mattmahoney.net/dc/text8.zip

unzip text8.zip

fi

if [ ! -e text8_en.db ]; then

text2ngram -n3 -f sqlite -o text8_en.db ./text8

text2ngram -a -n2 -f sqlite -o text8_en.db ./text8

text2ngram -a -n1 -f sqlite -o text8_en.db ./text8

fi

(takes abt 15 minutes and generates ~800 Mb of model file – we actually removed the 3grams and 2grams with frequency 1 and that reduced the size of the model to ~140Mb!)

The resulting language model database is a plain SQLite database, so you can query and manipulate its content by using simple SQL."

Dec 08 '15 20:12 ghost

Hi, I created this tutorial in order to make the creation of new dictionnaries accessible for anybody (I hope so !) :

https://github.com/01org/acat/wiki/Changing-language-and-creating-new-dictionnaries

But any improvment are welcome ;)

Jan 21 '16 18:01 samalkah

@samalkah Nice!

Do you have a problem with french characters maybe? I got good database with polish language utf8 but the application doesn't see them, so the narrator can't read it.

the text2ngram doesn't work on Windows from me. I generated database from linux (default: english)

can anyone help? Please.

Jan 21 '16 19:01 ghost

Oh you're right @H1ghty. The accents are not recognized. For example I had the word "présenter" in my original text file (that I used to create the french database) and in ACAT it became "prsenter". And even if I type the word in ACAT with the accent the suggestion will always be "prsenter".

So I guess the problem is not text2ngram but the database itself which doesn't recognize special characters and just replace them by nothing each time there is a word with a special character in input.

Or maybe the database is well created with accent but ACAT cannot just show them.... I should check in the database first to see where the problem come from

Jan 21 '16 20:01 samalkah

@samalkah I think that database is ok, but probably the application itself is a problem. You can check db with sqlitebrowser app and see that the words generated with text2ngram are fine, got all characters. text2ngram generates utf8 encoding by default and sqlite is utf8 by default too.

The font might be a problem, but I think it is Arial (?), so it should be fine.

Maybe the connection to database need to be improved, add somewhere to configuration that it should be utf8 encoding or something.

Jan 21 '16 20:01 ghost

@H1ghty Mmm ok, then that seems really complicated to resolve for a non-developer like me. I'm going to check in the code still

Jan 21 '16 21:01 samalkah

Hi @H1ghty and @samalkah, could you provide me the text file you used to create the dictionary? I need a ".txt" file with ANSI encoding (in order to preserve diacritics and special characters).

I am working on a portuguese version of ACAT and I already have word prediction and some screens translated. If you're interested I can help you extend it to your languages too (I already got spanish and italian word prediction working).

Here are some screenshots of the translated UI: img_20160114_242931207 img_20160114_243507404_top img_20160114_243517710_top

Jan 21 '16 23:01 brlima94

Hi @brlima94 Nice! I uploaded my text file with this post.

Could you share how did you translated it on UI side? Please.

text.txt textAnsi.txt

Jan 22 '16 05:01 ghost

Great job @brlima94 ! My text file is already encoded in ANSI. How did you manage make ACAT show the accents ?

I'm also interested in know where you changed the code to put UI in your language ;)

A_se_tordre1.txt

Jan 22 '16 11:01 samalkah

@samalkah I will create a french database file tonight and send you. It looks like you don't have any special characters that are not in the portuguese keyboard, so everything should run out of the box.

@H1ghty I'm not so sure the same process will work for polish, as the characters' range is way too diferent, but let's give it a try. If it doesn't work, we might need to talk over skype and try to find a way around this, as I can't change my Windows' locale to Polish.

Now about translating the UI, here is what needs to be done:

I will add ".resx" and ".xml" files according to your languages to the solution
You will translate them (I will talk about specifics later)
You will send me the translated files (either by doing a Pull Request or sending attached files)
I will include those files in the build process and notify you
You get the last version of the project and test it

Please note that the translation won't work until step 5 is complete.

After everything is complete, you might find a need to build a custom keyboard for your language, as I'm doing right now for portuguese. I can't talk much about it right now because I didn't finished it yet, but I've already done some progress with the ACAT App QWERTY keyboard (see screenshot below).

Jan 23 '16 02:01 brlima94

@brlima94 Are you a developer of ACAT ? Or just a good contributor ? Anyway, thank you for what you are doing. I'm probably going to install ACAT for a person who's suffering of Amyotrophic Lateral Sclerosis, and your help is really appreciated. I wish I was able to it by myself but these are beyond my developing skills I guess. But I'm still interested in knowing what you will do exactly with these xml and resx files to inject them in the program.

Jan 23 '16 10:01 samalkah

@samalkah I'm just a contributor. Could you please take a look at this version and see if the word prediction is working for you? You just need to extract the file and open "AcatApp.exe" or "AcatTalk.exe".

@H1ghty I need you to download 2 files:

The compiled project with some adjustments
A RAR file containing two database files (for each text file you provided): v1 is for ANSI, v2 is for UTF8-BOM

Extract the project and replace "Debug\Users\ACAT\WordPredictors\Presage\database.db" with one of the database files. Please let me know which one of those worked for you.

Jan 24 '16 01:01 brlima94

@brlima94 Thanks, will try that and come back to you. For now got a problem with "Fatal error. Error setting word prediction engine to [Presage Word Predictor]" , so first I need to deal with it .

Jan 24 '16 09:01 ghost

@brlima94 It doesn't work. I can't launch ACAT with what you gave me. No error but nothing happens, just the Windows blue circle.

Jan 24 '16 21:01 samalkah

@samalkah and @H1ghty please use this version instead, I've made some ajustments, so the first time you open it the default language will be set.

If you can't run the app, try setting compatibility mode to Windows 7 and run as administrator. If it still doesn't work, install Presage and ACAT, and make sure it runs without throwing any exceptions. Before opening this app, make sure to close ACAT and Presage (you may need to use Task Manager to do so), otherwise the custom word prediction won't work.

@H1ghty when following the instructions from my previous post, delete "Debug\Users" folder and put the database inside "Debug\Install\Users\ACAT\WordPredictors\Presage\database.db" before launching the app.

The language will be selected based on Windows' current language (e.g.: the same language used by Windows Explorer, Notepad, Windows Media Player, etc.). If you use Windows in english and want to test with french word prediction, you will first need to change your windows locale, delete "Debug\Users" folder and restart your computer before getting things to work.

For now, these are the languages supported for word prediction:

English
Portuguese
Spanish
Italian
French

To use word prediction in another language (e.g.: Polish, German, etc.), just replace "Debug\Install\Users\ACAT\WordPredictors\Presage\database.db" with your custom database.

Here's a screenshot after changing my Windows' locale to French: acat app qwerty fr

Jan 25 '16 03:01 brlima94

@brlima94 That looks great but it didn't work ^^. I can't open the program even after closing every presage and ACAT process. When I launch your program 3 ACAT process appears but nothing else happen and I can't see a presage process.

Do you think I have to uninstall my version of ACAT first ?

Jan 26 '16 08:01 samalkah

@samalkah have you tried setting compatibility mode to Windows 7 (in each .exe inside Debug folder) and run ACATApp.exe as administrator?If it still doesn't work, can you open ACAT using the desktop icons?

Jan 26 '16 17:01 brlima94

@samalkah if you have any antivirus or malware protection software try to disable, kill all acat and WCF presage process and then re run ACAT, dont forgett to run as administrator (sometimes the location of the folder makes windows block the startup of some proceses)

Jan 26 '16 21:01 nanomo

Hey well done @nanomo ! The problem was beacause of Avast.

@brlima94 It seems to work well. Just one thing about those characters : " ' " I don't know if you can do something about that, I will try to explain. For exemple " j'ai " is the contraction of "je" and "ai". The best thing would be to consider "j'ai" as one word. The same for "c'est" or "qu'il" ... and so on. These are very common "expressions", it could be assimilate to "It's" "that's", "there're" in english but with this difference : you can say and write "It is" instead of "it's" but you cannot say or write " je ai" instead of "j'ai". The contraction is mandatory in french and not optional like in english.

I don't know if it's clear for you and I on't know if you can do something to change that. If not that's ok, you've already done a great job thank you.

Jan 27 '16 21:01 samalkah

I will check it this weekend.

Here is another hint https://github.com/01org/acat/issues/11#issuecomment-175943577

Jan 30 '16 06:01 ghost

Hi, @brlima94 , I'm trying to use your application without installing anything (I change my computer) but it doesn't work. When I launch ACAT I got this message : Fatal Error. Error setting word prediction engine to "Presage Word Predictor"

But if I look in your directories I can't find any presage application. Should I install it first separately ?

Another thing, you didn't include Vision in your application because it caused some bugs ?

Feb 09 '16 15:02 samalkah

@samalkah presage needs to be installed first, then you can use the app I sent you.

I didn't include Vision because I don't have it's source code. You can install Intel's ACAT and use it without any problems.

Just remember to close presage before starting the app, otherwise word prediction might not work in your native language.

If you still got that error after installing presage, delete the "Users" folder before starting the app.

If you have any problems, please let me know.

Feb 10 '16 10:02 brlima94

Hi @brlima94 The first database worked for me with Windows 10 and polish language installed. I mostly working on a windows with english language so I couldn't check it earlier. It works with Vision too.

I found one issue that there is no polish characters on the UI and the words with polish characters are not saving in learn.db. But I expected that ;)

Feb 14 '16 10:02 ghost

Hi @H1ghty Unfortunately the issue of not saving those words in learn.db will remain for some time, as I couldn't find a way around Presage to save them without losing the polish chars (I tried to follow your hint making ACAT and Presage's WCF Client to work with UTF, but somehow the Presage's C++ DLL is "normalizing" the string to standard english chars).

Regarding the polish chars on the UI (the on-screen keyboard), could you confirm if these are the letters you need or there's something missing / not needed? a ą b c ć d e ę f g h i j k l ł m n ń o ó p q r s ś t u v w x y z ź ż

I am currently working on a full UI globalization, starting with Brazilian Portuguese. Once all strings are extracted from source code I can send you the files you need to translate to your language and then I will add them to the project. Is there any other way we can talk about this?

@samalkah The same works for French. Can you confirm if these are the letters you need on the on-screen keyboard? a à â ä b c ç d e è é ê ë f g h i î ï j k l m n o ° ô ö p q r s t u ù û ü v w x y z

I also found one thing about your issue with the Apostrophe char. Can you check if this is what you're looking for?

Feb 18 '16 03:02 brlima94

Thank you @brlima94. About the apastrophe it seems to be exactly my problem. I have to test the fix (but I have absolutely no idea how to apply the patch ^^).

About the letters you can remove the ä , ö. There is no french words with that. The ° is not used to make words but it can be useful to replace the "degree" word when you express temperature. So the best would but to put it in the special characters part of the screen keyboard. (But don't waste too much time on that detail).

In any case, if I can help you in some way, just tell me. I feel a bit useless now ^^

Feb 18 '16 08:02 samalkah

Hi everybody! My name is Dmitry, I am from Russia! My mother has ALS. ACAT is a great programm for such people! But I can't find ACAT in Russian. I am not programmer. Could anybody to help me to translate ACAT to Russian? For money or free - as you decide! I really need your help! Thanks a lot!

Mar 24 '16 14:03 DBaklanov

acat acat copied to clipboard

More Language Support?

acat
acat copied to clipboard