natural icon indicating copy to clipboard operation
natural copied to clipboard

Classification weight

Open jbowden1982 opened this issue 9 years ago • 6 comments

We are working on a project where we need to weigh different documents being added to a classifier. For example if I have a title and description I am adding I want the title to carry more weight. Any thoughts? Thank you.

jbowden1982 avatar May 05 '15 20:05 jbowden1982

Same issue here, so +1

PhilippKrone avatar May 18 '15 07:05 PhilippKrone

Hi, I'm trying to understand the question you have posted here. Are you talking about a document of the form:

doc = {
    title: 'Some Title',
    description: 'We are working on a project where we need to weigh different documents being added to a classifier. For example if I have a title and description I am adding I want the title to carry more weight.'
};

If not, how would the title be distinguished from the description of the document?

raiyankamal avatar May 22 '15 23:05 raiyankamal

You could repeat the title 10 times (or however much you want to increase the weight of the title. The classifier just counts things so increase the count of things you want to give more weight.

KyleAMathews avatar Dec 10 '15 19:12 KyleAMathews

I've found the current weighing unreliable at best... Somehow it's hard to believe that:

classifier.addDocument(['callback', 'hell', 'npm', 'thenable', 'promise'], 'node')
classifier.addDocument(['collections', 'database', 'db', 'mongo', 'mongodb', 'MongoDb', 'ObjectId'], 'database')

classifier.train()
console.log(classifier.classify('What a bunch of users collections'))

Will get classified as node... que? This seems very unlikely correct but I keep getting these awkward results back time and time again...

Even trimming it down:

classifier.addDocument(['callback', 'hell'], 'node')
classifier.addDocument(['collections', 'db'], 'database')
classifier.train()

console.log(classifier.classify('What a bunch of users collections'))

Happily returns node. I must have broken something somewhere? For one, weighing would allow me to inspect the relative importance of the term and tweak it so that it does provide (better|correct) results.

robjens avatar Jan 29 '16 17:01 robjens

I have exactly the same problems. I have noticed that using getClassifications all results have identical "value" response.

In your example, i got the same behaviour (using LogisticRegressionClassifier, but same for BayesClassifier where value is 0.6666666666666666) :

[ { label: 'node', value: 0.5 },
{ label: 'database', value: 0.5 } ]

The algorithm start to work only if it has at least "two pattern", like here :

 classifier.getClassifications('What a bunch of users collections db, hell!')

Result is :

 [ { label: 'database', value: 0.7072773051736339 },
  { label: 'node', value: 0.29272269482636626 } ]

So to "prevent" this case, ~~i check if value is 0.5 for all results, in which case, i try another way... but yeah it's ugly hack... would be great if devs figures out this case.~~ read next comment!

sam2x avatar Jun 04 '17 23:06 sam2x

wow @KyleAMathews solution kinda works.

classifier.addDocument(['callback', 'hell', 'npm', 'thenable', 'promise'], 'node')
classifier.addDocument(['collections', 'database', 'db', 'mongo', 'mongodb', 'MongoDb', 'ObjectId'], 'database')
classifier.addDocument(['collections'], 'database')
classifier.train();
console.log('result:', classifier.getClassifications('What a bunch of users collections'))

Result :

[ { label: 'database', value: 0.75 },
 { label: 'node', value: 0.5 } ]

Notes: the weigth is took in account only when you readd the pattern in the document. So the following code wont work :

  classifier.addDocument(['collections', 'collections', ...], 'database')

sam2x avatar Jun 04 '17 23:06 sam2x