Peter M. Stahl comments

Results 51 comments of


Peter M. Stahl

Python 3 support

Hi Tom, I'm a graduate in computational linguistics and would like to contribute to Pattern. Can you be more explicit about how Pattern should support Python 3? That is, do...

Python 3 support

Hi Tom, as I wrote at the beginning of this year, I'm still interested in contributing to pattern. However, I have not started yet because I didn't really know where...

Python 3 support

@hayd OK, I get your point. I'm okay with that. It just reminds me again of how unhappy I am about the Python 3.\* transition in general across the Python...

Hello, I'm the author of [Lingua](https://github.com/pemistahl/lingua-py). I've managed to reduce the memory consumption of the library. All language models together now just take around 800 MB in memory. Perhaps you...

Replace pycld3 dependency?

@osma I have just released [Lingua 1.1.0](https://github.com/pemistahl/lingua-py/releases/tag/v1.1.0). In high accuracy mode, memory consumption is now at 800 MB. In low accuracy mode, it's even just 60 MB. ![plot](https://raw.githubusercontent.com/pemistahl/lingua-py/main/images/plots/boxplot-average.png)

Replace pycld3 dependency?

Yes, that's because the models are now stored in NumPy arrays instead of dictionaries. Querying the arrays is slower than querying dictionaries, that's the downside. But I still use a...

Replace pycld3 dependency?

FYI: There was a little bug in version 1.1.0 that caused wrong probabilities to be returned for certain ngrams. I've just fixed that. So please use version 1.1.1 now for...

Replace pycld3 dependency?

@osma My library is slower because it is written in pure Python. pycld3 is written in C++ and simplemma uses [`mypyc`](https://github.com/mypyc/mypyc) to compile the Python modules to C extensions. I've...

Replace pycld3 dependency?

> I apologize for the harsh wording. No worries, @osma. I'm not resentful. :) > I understand that Lingua's strong point is the high accuracy it achieves. But for an...

Distinguish between different variations of the same language

Hi @BLKSerene, thank you for your request. The library already distinguishes between Bokmal and Nynorsk. As for Simplified and Traditional Chinese, I could not find suitable training corpora yet which...