Osma Suominen comments

Results 374 comments of


                                            Osma Suominen

Replace pycld3 dependency?

Reported the huge memory usage in Simplemma as https://github.com/adbar/simplemma/issues/19

> @osma My library is slower because it is written in pure Python. pycld3 is written in C++ and simplemma uses [mypyc](https://github.com/mypyc/mypyc) to compile the Python modules to C extensions....

Replace pycld3 dependency?

I realized that I can just run the Omikuji evaluation part again with Lingua 1.1.3, without redoing the whole benchmark. Hang on...

Replace pycld3 dependency?

@pemistahl I upgraded to Lingua 1.1.3 and reran the Omikuji and MLLM evaluations. The Omikuji evaluation runtime decreased from 935 to 856 seconds and the MLLM runtime from 1210 to...

Replace pycld3 dependency?

I finished the (partial) benchmark of Lingua in high-accuracy mode and edited the results table above accordingly. The runtime was at least an order of magnitude larger than in low-accuracy...

Replace pycld3 dependency?

Thanks for the tip @adbar , I wasn't aware of hyperfine. Though it seems to me it will only measure execution time, not memory usage.

Replace pycld3 dependency?

> This is absolutely reasonable. Then Lingua is simply not the right tool for your job. That's ok. Luckily, there are enough language detectors to choose from, especially in the...

Maybe move UI to a separate project?

Interesting idea. I'm a bit torn on this. I've never thought of the Annif web UI as a separate project, more of an administrative interface for testing models. Very similar...

Feature request: Force data reading from disk without restarting Annif

Thank you for the suggestion. It shouldn't be too hard to implement, but I wonder how this would be triggered. It should happen through the REST API, I think, because...

Use LMDB to store vectors in PAV backend

After switching to sparse vectors (#379) in the PAV backend the RAM usage is now much lower, so this is not so crucial anymore.