nlp_primitives icon indicating copy to clipboard operation
nlp_primitives copied to clipboard

Add tfidf primitive

Open gsheni opened this issue 4 years ago • 0 comments

tf–idf, TF*IDF, or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus

  • https://en.wikipedia.org/wiki/Tf%E2%80%93idf
  • Found here:
class Tfidf(TransformPrimitive):
    name = 'tfidf'
    input_types = [NaturalLanguage]
    return_dtype = Numeric
    commutative = True

    @property
    def number_output_features(self):
        pass
        
  • Allow the user to pass in the corpus
  • We should name the primitive similar to LSA (https://github.com/alteryx/nlp_primitives/pull/161)

gsheni avatar Nov 09 '21 20:11 gsheni