nlp_primitives issues

Add Python 3.12 support

Supporting Python 3.12 will require removing the tensorflow upper bound restrictions in #270 as only the latest version of tensorflow and tensorflow-hub support Python 3.12. However, these new tensor flow...

thehomebrewnerd

Update to remove upper bound restriction on tensorflow and tensorflow-hub in pyproject.toml

thehomebrewnerd

Add OpenAI Embeddings Primitive

1

Adds a primitive for natural language logical types that uses the [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings) to calculate embeddings features. The model to use is configurable, but `text-embedding-ada-002` is used by default....

jlouns

Add OpenAI Embeddings Primitive

- Add a primitive for natural language logical types that uses the [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings) to calculate embeddings features.

gsheni

Potentially unneeded memory alloc in `StopwordCount`

I don't think we need to create a list in this comprehension: https://github.com/alteryx/nlp_primitives/blob/11837a50de79fd05e067f58075100f39f639e563/nlp_primitives/stopword_count.py#L42 Instead, I think we can just iterate over `words` and keep a `count` variable. For very large...

sbadithe

Add Levenshtein Distance primitive

- We can leverage the nlkt function: https://www.nltk.org/api/nltk.metrics.distance.html

gsheni

Add tfidf primitive

> tf–idf, TF*IDF, or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a...

gsheni

HashtagFrequency Primitive

As a user, I wish there was a NLP primitive that computed the frequency of hashtags given a list of texts, as in https://stackoverflow.com/questions/49865756/extract-and-count-hashtags-from-a-dataframe/49865854#49865854. #139 is similar, but computes the...

sbadithe

Support Unicode

1

As a user, I wish NLP Primitives had the ability to handle unicode text. Currently, Unicode text is not correctly handled by regexes in `nlp_primitives`. For example, `Àbc` is not...

sbadithe

enhancement

Study pytorch NLP capabilities and implement primitives

We should spend a little time looking into the NLP capabilities of pytorch and identify what could be included as primitives in this library. https://pytorch.org/text/stable/index.html

thehomebrewnerd

nlp_primitives
nlp_primitives copied to clipboard

Metadata

Add Python 3.12 support

Update to remove upper bound restriction on tensorflow and tensorflow-hub in pyproject.toml

Add OpenAI Embeddings Primitive

Add OpenAI Embeddings Primitive

Potentially unneeded memory alloc in `StopwordCount`

Add Levenshtein Distance primitive

Add tfidf primitive

HashtagFrequency Primitive

Support Unicode

Study pytorch NLP capabilities and implement primitives

← Metadata

Owner

Metadata

nlp_primitives nlp_primitives copied to clipboard

Metadata

← Metadata

Owner

Metadata

nlp_primitives
nlp_primitives copied to clipboard