mockingbird
mockingbird copied to clipboard
Programming Language Classifier
Mockingbird 
Introduction
Linguist's Classifier in Go.
Linguist can be used as a Go package by
import "github.com/lazywei/linguist"
and it also has a CLI (command line interface) in cli/
$ cd cli/
$ ./build.sh
$ ./mockingbird --help
Command Line Interface Usage
Preparing LIBSVM format dataset
Collect Rosetta Code
- Clone the RosettaCodeData
git clone [email protected]:acmeism/RosettaCodeData.git
- Build this
cliexecutable
cd cli/
./build.sh
- Run the
collectRosettaaccording to the cloned RosettaCodeData, and collect files to../samples
./mockingbird collectRosetta path/to/clones/RosettaCodeData ../samples
Build Bag-of-Words and Convert Samples to Libsvm
Build from scratch
./mockingbird convertLibsvm ../samples ../
This will save libsvm.samples and bow.gob to ../. The bow.gob is the
parameters for constructing bag-of-words. This can be used afterward:
./mockingbird convertLibsvm ../samples ../ --bowPath ../bow.gob
Train and Predict
Train
For example, train a logisitic regression classifier:
./mockingbird train --sample=./test_fixture/test_samples.libsvm --solver 1
This will save a model file in $PWD/model/lr.model, which can be used in
later prediction.
Full usage:
usage: mockingbird train [<flags>]
Train Classifier
Flags:
--help Show help (also see --help-long and --help-man).
--sample="samples.libsvm"
Path for samples (in libsvm format)
--output="model" Path for saving trained model
--solver=0 0 = NaiveBayes, 1 = LogisticRegression
Predict
For example, make prediction via previously trained logisitic regression classifier:
./mockingbird predict --model=./model/lr.model --data=./test_fixture/test_samples.libsvm --solver=1
Full usage:
usage: mockingbird predict --data=DATA [<flags>]
Predict via trained Classifier
Flags:
--help Show help (also see --help-long and --help-man).
--model="./model/naive_bayes.gob"
Path for loading saved model
--data=DATA Path for testing data (in libsvm format)
--solver=0 0 = NaiveBayes, 1 = LogisticRegression