kristian-clausal

Results 51 comments of kristian-clausal

That looks like it might be the exact same problem I've had with Lua/Lupa crashes. Tatu took a look at it yesterday, it might be a version issue mismatch: requirements.txt...

This is an issue that can't easily be fixed through coding. I'll take some time tomorrow to go through a list of obvious problematic cases we generated today and fix...

I spent the whole day going through a _short_ list of error-candidates for this; basically, "form of" entries that are suspicious and have a native language word + english language...

> On the plus side, the sense-specific pronunciations of [micrometer](https://kaikki.org/dictionary/All%20languages%20combined/meaning/m/mi/micrometer.html) are correctly separated, so I suppose this is only an issue with POS-specific pronunciations. That's because the original wiktionary article...

Fixed with 54a058bb77df73dfc3b638a750da87a0b91ed9c4, 05f0a5f2834655ba32b440cbd7445f242cd3c68a, and 23a7d8f20f44a7322eeb3482ba2d03a22cd9a4a7. With these commits, which didn't break anything in our tests because I don't think we have tests for Pronunciation sections, the two issues are...

At first I thought we were breaking some kind of requirement by not naming the files with .jsonl, but turns out that's just a "suggestion"; the three requirements are utf-8...

I've merged your pull request, it seemed like a reasonable workaround.

Good catch, Oskari is taking a look at it, and I think the issue is pretty clearly with the "Pronunciation 1" and "Pronunciation 2" pseudo-etymology blocks. From 垃圾, what's missing...

Unfortunately the data we provide is not suitable to be used straightforwardly in .tsv or .csv. The JSON data is hierarchical, with big and reasonably sprawling word structures that contain...

> > * program a script that will do that translation by reading the json file object-by-object and then outputting it into .tsv > > I think [pyglossary](https://github.com/ilius/pyglossary) supports conversion...