Antti Puurula

Results 23 comments of Antti Puurula

I notice some of the optimizations from my library (https://github.com/anttttti/Wordbatch) were ported to Kaggler. The new version 1.3 has a similar set of online models (FM, NN) to Kaggler, but...

I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization...

I didn't get a patch written, I hacked the code first to see how easily this could be done. I think I got it working for the AdaGradRegressor case, but...

I made a version of FTRL available as part of the package I made available: https://github.com/anttttti/Wordbatch/blob/master/wordbatch/models/ftrl.pyx This support partial fit and online learning, weighted features, link function for classification/regression, and...

Could you make developer-friendly interface and trained models available from an open source such as Wikipedia dumps? There's a use case for off-the-shelf decompounding and morphological splitting tools, but Morfessor...

Not currently. This can be added as a feature easily. You can add something like this as the last line in your text normalization function : text+= " "+bigrams(text) This...

It's been developed on Linux so far. There's only a couple of dependencies like py-lz4framed, but those might not compile on Mac/Windows. The py-lz4framed library is only used for compression...

Installation instructions will need an update. Just installed on a clean AWS Ubuntu 16.04 instance. Needed to call these before I could do pip install: sudo apt-get update sudo install...

It seems latest version of MacOS (Monterey) doesn't get the gcc-7 links right. Doing this resolves the compilation issues with clang not supporting OpenMP and gcc not found for compilation:...

Can you give more details? This doesn't reproduce on Ubuntu 16.04, Python 3.6.4 |Anaconda custom (64-bit), Numpy 1.14.2 and WB 1.35. The model trains fine with this setup.