gateplugin-LearningFramework icon indicating copy to clipboard operation
gateplugin-LearningFramework copied to clipboard

A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNN...

Results 45 gateplugin-LearningFramework issues
Sort by recently updated
recently updated
newest added

See https://github.com/dmlc/xgboost This has a JVM implementation, so should be possible without a wrapper: https://github.com/dmlc/xgboost/tree/master/jvm-packages

enhancement
IMPORTANT

Add a parameter (maybe just something to be used as an "algorithmParameter") to enable training set caching: whatever corpus representation the chosen algorithm uses, that representation will get saved to...

This is just to keep track of potentially interesting candidates and their advantages and disadvantages. - https://www.h2o.ai/ - https://github.com/JohnLangford/vowpal_wabbit (!!!) - FACTORIE http://factorie.cs.umass.edu/ , https://github.com/factorie/factorie - http://dlib.net/ Pro: usable license,...

A bunch of improvements, changes and checks to do, collected into this single issue: 1. make sure we save the right data for later topic inference: * model file: apparently...

This is probably a result of re-factoring so that the engine now decides on which corpus representation to use. But this means the engine has to know how to initialize...

The idea of scaling is that some features do not have a bigger influence on the model than others. Our current approaches maybe do not do this properly and may...

It can happen that a corpus contains training instances grouped by class which is very bad for training. In such cases there should be a way to either shuffle the...

Currently we add a list with the class distribution / scores and a list with the labels to every instance. Adding the label list to every instance is redundant, since...

Currently only the details file is written, should add at least a file with the topic words and maybe also a file with topics per document distribution (could be derived...

The LF spends a lot of time and uses a lot of memory after the topic words have been logged to the message pane. Figure out which steps are slow...