nlp_primitives
nlp_primitives copied to clipboard
Natural Language Processing primitives for Featuretools
Supporting Python 3.12 will require removing the tensorflow upper bound restrictions in #270 as only the latest version of tensorflow and tensorflow-hub support Python 3.12. However, these new tensor flow...
Adds a primitive for natural language logical types that uses the [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings) to calculate embeddings features. The model to use is configurable, but `text-embedding-ada-002` is used by default....
- Add a primitive for natural language logical types that uses the [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings) to calculate embeddings features.
I don't think we need to create a list in this comprehension: https://github.com/alteryx/nlp_primitives/blob/11837a50de79fd05e067f58075100f39f639e563/nlp_primitives/stopword_count.py#L42 Instead, I think we can just iterate over `words` and keep a `count` variable. For very large...
- We can leverage the nlkt function: https://www.nltk.org/api/nltk.metrics.distance.html
> tf–idf, TF*IDF, or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a...
As a user, I wish there was a NLP primitive that computed the frequency of hashtags given a list of texts, as in https://stackoverflow.com/questions/49865756/extract-and-count-hashtags-from-a-dataframe/49865854#49865854. #139 is similar, but computes the...
As a user, I wish NLP Primitives had the ability to handle unicode text. Currently, Unicode text is not correctly handled by regexes in `nlp_primitives`. For example, `Àbc` is not...
We should spend a little time looking into the NLP capabilities of pytorch and identify what could be included as primitives in this library. https://pytorch.org/text/stable/index.html