gateplugin-LearningFramework icon indicating copy to clipboard operation
gateplugin-LearningFramework copied to clipboard

Simplify DNN dense corpus and interaction with python backend

Open johann-petrak opened this issue 6 years ago • 0 comments

  • see also https://github.com/GateNLP/gate-lf-python-data/issues/15
  • Keep the option to have many features but make it easy to have just the simple one-feature approach.
  • Store dense corpus instances as maps in each line with standard keys for label and possibly features
  • unify with representation for unlabeled data (e.g. embedding creation or topic models) and other kinds of supervised/unsupervised tasks, e.g. seq2seq or semantic similarity
  • !!!! change representation of sequences: instead of having a sequence of element with multiple features, have a sequence for each feature. Makes it MUCH easier to create batches later.
  • Make it easy to swith between our output and the torchnlp library in the python backend

This should become a project possibly with several subissues.

johann-petrak avatar Nov 22 '18 18:11 johann-petrak