natural Classification weight

We are working on a project where we need to weigh different documents being added to a classifier. For example if I have a title and description I am adding I want the title to carry more weight. Any thoughts? Thank you.

May 05 '15 20:05 jbowden1982

Same issue here, so +1

May 18 '15 07:05 PhilippKrone

Hi, I'm trying to understand the question you have posted here. Are you talking about a document of the form:

doc = {
    title: 'Some Title',
    description: 'We are working on a project where we need to weigh different documents being added to a classifier. For example if I have a title and description I am adding I want the title to carry more weight.'
};

If not, how would the title be distinguished from the description of the document?

May 22 '15 23:05 raiyankamal

You could repeat the title 10 times (or however much you want to increase the weight of the title. The classifier just counts things so increase the count of things you want to give more weight.

Dec 10 '15 19:12 KyleAMathews

I've found the current weighing unreliable at best... Somehow it's hard to believe that:

classifier.addDocument(['callback', 'hell', 'npm', 'thenable', 'promise'], 'node')
classifier.addDocument(['collections', 'database', 'db', 'mongo', 'mongodb', 'MongoDb', 'ObjectId'], 'database')

classifier.train()
console.log(classifier.classify('What a bunch of users collections'))

Will get classified as node... que? This seems very unlikely correct but I keep getting these awkward results back time and time again...

Even trimming it down:

classifier.addDocument(['callback', 'hell'], 'node')
classifier.addDocument(['collections', 'db'], 'database')
classifier.train()

console.log(classifier.classify('What a bunch of users collections'))

Happily returns node. I must have broken something somewhere? For one, weighing would allow me to inspect the relative importance of the term and tweak it so that it does provide (better|correct) results.

Jan 29 '16 17:01 robjens

I have exactly the same problems. I have noticed that using getClassifications all results have identical "value" response.

In your example, i got the same behaviour (using LogisticRegressionClassifier, but same for BayesClassifier where value is 0.6666666666666666) :

[ { label: 'node', value: 0.5 },
{ label: 'database', value: 0.5 } ]

The algorithm start to work only if it has at least "two pattern", like here :

 classifier.getClassifications('What a bunch of users collections db, hell!')

Result is :

 [ { label: 'database', value: 0.7072773051736339 },
  { label: 'node', value: 0.29272269482636626 } ]

So to "prevent" this case, ~~i check if value is 0.5 for all results, in which case, i try another way... but yeah it's ugly hack... would be great if devs figures out this case.~~ read next comment!

Jun 04 '17 23:06 sam2x

wow @KyleAMathews solution kinda works.

classifier.addDocument(['callback', 'hell', 'npm', 'thenable', 'promise'], 'node')
classifier.addDocument(['collections', 'database', 'db', 'mongo', 'mongodb', 'MongoDb', 'ObjectId'], 'database')
classifier.addDocument(['collections'], 'database')
classifier.train();
console.log('result:', classifier.getClassifications('What a bunch of users collections'))

Result :

[ { label: 'database', value: 0.75 },
 { label: 'node', value: 0.5 } ]

Notes: the weigth is took in account only when you readd the pattern in the document. So the following code wont work :

  classifier.addDocument(['collections', 'collections', ...], 'database')

Jun 04 '17 23:06 sam2x

natural natural copied to clipboard

Classification weight

natural
natural copied to clipboard