conversationai-models Create a token/embedding creation preprocessing pipeline using tf-transform

Create a token/embedding creation preprocessing pipeline using tf-transform

Open iislucas opened this issue 6 years ago • 1 comments

Issue: We currently depend on vocabularies, like glove embeddings, that are:

Weirdly biased (although when you backprop to the embeddings, their initial bias is not very relevant anymore),
Depend on being consistent with the tokenizer we use.
Don't necessarily have the same words as our actual text.

Proposed solution project: Use https://github.com/tensorflow/transform to develop text preprocessing pipelines, e.g. to select tokens that occur sufficiently frequently, and create either random or smarter word embeddings for them.

Jul 02 '18 21:07 iislucas

FYI: Not sure if that helps but here is a basic example with tft: https://github.com/tensorflow/transform/blob/master/examples/sentiment_example.py

Jul 17 '18 16:07 fprost

conversationai-models conversationai-models copied to clipboard

Create a token/embedding creation preprocessing pipeline using tf-transform

conversationai-models
conversationai-models copied to clipboard