liblinear icon indicating copy to clipboard operation
liblinear copied to clipboard

Add Normalisation Option

Open hsinyuan-huang opened this issue 9 years ago • 0 comments

With -n option added when executing train, It would normalize every instance so it become unit length in Euclidean norm. And the .model file would have a line of "normalization 1" in it. (if -n is not added, then it would be "normalization 0") So when you execute predict with that .model file, it will also do the normalization on the test data. (Thus the old .model file would be deprecated)

Example: train -s 7 -n -e 1e-6 covtype.libsvm.binary

Some comparison: train -s 0 -e 1e-6 covtype.libsvm.binary: 0m39.492s, with -n: 0m04.001s train -s 1 -e 1e-6 covtype.libsvm.binary: 2m47.380s, with -n: 0m10.116s train -s 2 -e 1e-6 covtype.libsvm.binary: 0m38.217s, with -n: 0m05.072s train -s 7 -e 1e-6 covtype.libsvm.binary: 3m42.433s, with -n: 0m07.034s

train -s 1 -e 1e-6 splice.txt predict splice.t splice.txt.model out.txt: 84.2299%, with -n: 84.9655%

train -s 1 -e 1e-6 a9a.txt predict a9a.t a9a.txt.model out.txt: 84.9395%, with -n: 85.0132%

train -s ALPHA -e 1e-6 w1a.txt predict w1a.t w1a.txt.model out.txt: 96.9221%, with -n: 97.6625% ALPHA from 0 to 7: (without -n) Accuracy = 97.2902% (45991/47272) Accuracy = 96.8903% (45802/47272) Accuracy = 96.9221% (45817/47272) Accuracy = 97.1019% (45902/47272) Accuracy = 96.8523% (45784/47272) Accuracy = 97.3959% (46041/47272) Accuracy = 97.5736% (46125/47272) Accuracy = 97.2902% (45991/47272) (with -n) Accuracy = 97.5144% (46097/47272) Accuracy = 97.6646% (46168/47272) Accuracy = 97.6625% (46167/47272) Accuracy = 97.6455% (46159/47272) Accuracy = 97.635% (46154/47272) Accuracy = 97.745% (46206/47272) Accuracy = 97.3473% (46018/47272) Accuracy = 97.5144% (46097/47272)

train -s ALPHA -e 1e-6 svmguide1.txt predict svmguide1.t svmguide1.txt.model out.txt ALPHA from 0 to 7: (without -n) Accuracy = 79.025% (3161/4000) Accuracy = 78.95% (3158/4000) Accuracy = 78.925% (3157/4000) Accuracy = 59.125% (2365/4000) Accuracy = 76.625% (3065/4000) Accuracy = 78.9% (3156/4000) Accuracy = 79.025% (3161/4000) Accuracy = 80.125% (3205/4000) (with -n) Accuracy = 78.4% (3136/4000) Accuracy = 78.425% (3137/4000) Accuracy = 78.425% (3137/4000) Accuracy = 78.3% (3132/4000) Accuracy = 78.5% (3140/4000) Accuracy = 79.025% (3161/4000) Accuracy = 78.225% (3129/4000) Accuracy = 78.4% (3136/4000)

It is basically a lot faster with similar accuracy.

hsinyuan-huang avatar Aug 14 '14 01:08 hsinyuan-huang