Hannes Krumbiegel comments

Results 91 comments of


                                            Hannes Krumbiegel

Using spacy for POS detection when creating word wise epub

I wrote a bit not very tested code here (API might change): https://github.com/Vuizur/wiktextract-lemmatization The theory is that you can put a forms array in there and get a fixed one...

Using spacy for POS detection when creating word wise epub

Nice, the word detection works now flawlessly for Russian. 👍

Talk = GPT-2 + Whisper + WASM

These results are extremely impressive! I recently tried to implement something [similar](https://github.com/Vuizur/gpt3-chatbot) in Python, only not locally, but instead using different online APIs, but it felt worse than your demo...

Missing form_of keys for some senses

I'll do it with the next released dump. 👍 Thanks a lot for the work!

Missing form_of keys for some senses

The external hard drive I ran the calculations on seems to be dying, I will have to find another way to repeat it 😅.

Output from subextractors need to be a bit more closely aligned with original output

I noticed that the English translations have the key "code" for the items of the translation array, whereas other languages such as Spanish have the key "lang_code". (I think lang_code...

Guide to adding new Wiktionaries

I haven't quite gotten it to work, my current version prints a huge number of error. Some small selection: ``` еділя: DEBUG: UNIMPLEMENTED top-level template: -uk- {} at ['неділя', '-uk-']...

Guide to adding new Wiktionaries

Thanks for the help. The error might be in the data files, I will have to keep looking. I also can't seem to get the program to run completely (using...

Guide to adding new Wiktionaries

Hmm, I haven't gotten it to work yet (but I also didn't have that much time recently). I think Wiktextract works pretty decently with the Chinese Wiktionary because they have...

Fix word wise for stressed Russian epubs

Removing that one character works fine for books created by my program, for general purpose one should maybe use the more sophisticated remove_accents function like in Proficiency, which can also...