liblinear
liblinear copied to clipboard
Add Normalisation Option
With -n option added when executing train, It would normalize every instance so it become unit length in Euclidean norm. And the .model file would have a line of "normalization 1" in it. (if -n is not added, then it would be "normalization 0") So when you execute predict with that .model file, it will also do the normalization on the test data. (Thus the old .model file would be deprecated)
Example: train -s 7 -n -e 1e-6 covtype.libsvm.binary
Some comparison: train -s 0 -e 1e-6 covtype.libsvm.binary: 0m39.492s, with -n: 0m04.001s train -s 1 -e 1e-6 covtype.libsvm.binary: 2m47.380s, with -n: 0m10.116s train -s 2 -e 1e-6 covtype.libsvm.binary: 0m38.217s, with -n: 0m05.072s train -s 7 -e 1e-6 covtype.libsvm.binary: 3m42.433s, with -n: 0m07.034s
train -s 1 -e 1e-6 splice.txt predict splice.t splice.txt.model out.txt: 84.2299%, with -n: 84.9655%
train -s 1 -e 1e-6 a9a.txt predict a9a.t a9a.txt.model out.txt: 84.9395%, with -n: 85.0132%
train -s ALPHA -e 1e-6 w1a.txt predict w1a.t w1a.txt.model out.txt: 96.9221%, with -n: 97.6625% ALPHA from 0 to 7: (without -n) Accuracy = 97.2902% (45991/47272) Accuracy = 96.8903% (45802/47272) Accuracy = 96.9221% (45817/47272) Accuracy = 97.1019% (45902/47272) Accuracy = 96.8523% (45784/47272) Accuracy = 97.3959% (46041/47272) Accuracy = 97.5736% (46125/47272) Accuracy = 97.2902% (45991/47272) (with -n) Accuracy = 97.5144% (46097/47272) Accuracy = 97.6646% (46168/47272) Accuracy = 97.6625% (46167/47272) Accuracy = 97.6455% (46159/47272) Accuracy = 97.635% (46154/47272) Accuracy = 97.745% (46206/47272) Accuracy = 97.3473% (46018/47272) Accuracy = 97.5144% (46097/47272)
train -s ALPHA -e 1e-6 svmguide1.txt predict svmguide1.t svmguide1.txt.model out.txt ALPHA from 0 to 7: (without -n) Accuracy = 79.025% (3161/4000) Accuracy = 78.95% (3158/4000) Accuracy = 78.925% (3157/4000) Accuracy = 59.125% (2365/4000) Accuracy = 76.625% (3065/4000) Accuracy = 78.9% (3156/4000) Accuracy = 79.025% (3161/4000) Accuracy = 80.125% (3205/4000) (with -n) Accuracy = 78.4% (3136/4000) Accuracy = 78.425% (3137/4000) Accuracy = 78.425% (3137/4000) Accuracy = 78.3% (3132/4000) Accuracy = 78.5% (3140/4000) Accuracy = 79.025% (3161/4000) Accuracy = 78.225% (3129/4000) Accuracy = 78.4% (3136/4000)
It is basically a lot faster with similar accuracy.