gramophone
gramophone copied to clipboard
tf-idf support
Hi,
How would you advice to implement tf-idf inside gramophone ?
Good work, by the way.
It's possible to process the documents with gramophone and then pass the results to natural's tf-idf function. See this https://gist.github.com/bxjx/7001437.
I've also added a { flatten: true}
option to gramophone that should make this easier (version 0.0.3).
Let me know how you go! You may run into issues using stemming. If you do, post back and I can probably sort them out. The tf-idf code from natural is pretty straightforward and I could either pull it into gramophone or add a pull request to make it more friendly for using alternative tokenizers like gramophone.
Thanks for the props! This library is fairly niche and it's nice to know someone else might benefit.
I have tested your example with natural 0.1.24 (tf-idf was broken in natural 0.1.23). It works fine.
However, the function tfidf.listTerms
is broken:
tfidf.listTerms(0 /*document index*/).forEach(function(item, indx) {
console.log(item.term + ': ' + item.tfidf);
});
returns node programming language: NaN
instead of returning the tf-idf measure.
To work around this I had to modify natural's listTerms function this way:
terms.push({term: term, tfidf: this.tfidf([term], d)})
instead of
terms.push({term: term, tfidf: this.tfidf(term, d)})
I am not sure of how the issue could be fixed in a clean way.