gateplugin-LearningFramework issues

Add support for XGBoost

See https://github.com/dmlc/xgboost This has a JVM implementation, so should be possible without a wrapper: https://github.com/dmlc/xgboost/tree/master/jvm-packages

johann-petrak

enhancement

IMPORTANT

Training set caching / corpusrepresentation caching

Add a parameter (maybe just something to be used as an "algorithmParameter") to enable training set caching: whatever corpus representation the chosen algorithm uses, that representation will get saved to...

johann-petrak

Add support for other interesting learning frameworks and libraries

1

This is just to keep track of potentially interesting candidates and their advantages and disadvantages. - https://www.h2o.ai/ - https://github.com/JohnLangford/vowpal_wabbit (!!!) - FACTORIE http://factorie.cs.umass.edu/ , https://github.com/factorie/factorie - http://dlib.net/ Pro: usable license,...

johann-petrak

Mallet LDA improvements/changes

6

A bunch of improvements, changes and checks to do, collected into this single issue: 1. make sure we save the right data for later topic inference: * model file: apparently...

johann-petrak

Scaling does not work any more

1

This is probably a result of re-factoring so that the engine now decides on which corpus representation to use. But this means the engine has to know how to initialize...

johann-petrak

Rethink feature scaling

The idea of scaling is that some features do not have a bigger influence on the model than others. Our current approaches maybe do not do this properly and may...

johann-petrak

Add a way to randomly shuffle the corpus / data file

1

It can happen that a corpus contains training instances grouped by class which is very bad for training. In such cases there should be a way to either shuffle the...

johann-petrak

gateplugin-LearningFramework
gateplugin-LearningFramework copied to clipboard

Metadata

Add support for XGBoost

Training set caching / corpusrepresentation caching

Add support for other interesting learning frameworks and libraries

Mallet LDA improvements/changes

Scaling does not work any more

Rethink feature scaling

Add a way to randomly shuffle the corpus / data file

Better way to add full class distributions at application time

Write more information to the LDA model directory

LDA with mallet: inspect memory/time on larger corpora

← Metadata

Owner

Metadata

gateplugin-LearningFramework gateplugin-LearningFramework copied to clipboard

Metadata

← Metadata

Owner

Metadata

gateplugin-LearningFramework
gateplugin-LearningFramework copied to clipboard