colibri-core icon indicating copy to clipboard operation
colibri-core copied to clipboard

[Queries] Ability to create a model and cls from multiple input files

Open manrock007 opened this issue 6 years ago • 0 comments

Hi,

To begin with, Thank you.. For the amazing work you've done so far.. I have a few questions regarding my usage of colibric-core in my project

What I am trying to build is a model that learns recurring patterns from a set of input text files. These are log files of a collection of software components.

Each line in my log file is converted to a unique hash representing that line, and the input to the training is a single line whose words are the hashes, word count is equal to the line count of the actual log file. This is done to generate patterns across lines and not words.

The model is then used to analyse whether patterns in a given test file matches against the training data, to detect any anomalies or unknown patterns. I am using your library for it's ability of creating variable length ngrams, skipgrams and flexgrams. The questions that I have are as follows -

  1. How do I create a unified model and class file, that contains patterns learnt from multiple input files
  2. Do I save the class file and model after every instance of model trained from an input file, or can I train from multiple input files and then finally call .save/ ,write
  3. Is there a way to perform this training on multiple cores, while saving the information to a single model? Multithreading?
  4. Alternatively is it possible to create temporary multiple models through a batch operation and then somehow merge them together to a single model file and .cls file?
  5. Also, I see random crashes some times while parsing a file. Re-running the training on the same file again sometimes results in a crash at the same point, and sometimes doesn't, which is weird. I'll try to get the backtraces for those crashes whenever i reproduce the issue again..

I am willing to contribute any changes done in regards to the above requirements if you could just guide me. I have also attached the relevant code that shows my usage of the library.

train_program.py.zip

manrock007 avatar Jul 20 '17 12:07 manrock007