gateplugin-LearningFramework
gateplugin-LearningFramework copied to clipboard
A plugin for the GATE language technology framework for training and using machine learning models. Currently supports Mallet (MaxEnt, NaiveBayes, CRF and others), LibSVM, Scikit-Learn, Weka, and DNN...
Initially: * Train Topic Model: ideally this would also make use of the example pipelines from stringannotation for token filtering by stopwords and corpusstats for token filtering by tfidf, but...
Currently some ways of not using a PR properly, either by using the wrong PR or by setting the parameters in a way that is not proper for a corpus...
This will need an even simpler "corpus representation" for text (list of tokens) only.
The parameters for train.py have been changed to make use from the command line easier, need to adapt the engine code to do this right. May need to adapt the...
It would be good to have some way to run the training PR on a cached training set, only changing the training algorithm or hyperparameters. This should work even for...
This is a bit messy at the moment: make sure we always assign the correct confidence scores to a classification (and if possible, all class labels) if the algorithm returns...
Ideally, we should have both encoding and decoding code in the SeqEncoder (mabe rename to SeqEncoderDecoder) classes but currently the decoding is in the ModelApplication class. Needs some refactoring and...
This is for speeding up the mavenization and getting rid of some obstacles quickly: the LearningFramework depends on a couple of libraries which would have to be available on Maven...
Either optionally, or by default write both data and metadata gzip-compressed and make the python library deal with it properly.
We at least must support out of core exporting, ideally would also support OOC training for some engines or algorithms. Maybe for wrapping https://github.com/JohnLangford/vowpal_wabbit and neural networks as well as...