addons
addons copied to clipboard
EmbeddingBag for ragged indices
Describe the feature and the current behavior/state. Currently EmbeddingBag and underlying op works only with dense tensors. But a lot of nlp ops/tasks in modern require to use RaggedTensors. Converting them to Dense will cause performance downgrade.
For example, let's take FastText ngram method. We have a batch of words [BATCH]. We want to split them into ngrams [BATCH, NGRAMS]. Then we look them up in vocabulary to obtain indices [BATCH, NGRAM_INDICES]. Next we want to obtain embeddings and reduce them with sum/mean/etc. In this case ngrams and indices are ragged tensors, so it would be cool if we can use EmbeddingBag for the last two operations (embed + reduce).
Relevant information
- Are you willing to contribute it (yes/no): no
- Are you willing to maintain it going forward? (yes/no): no
- Is there a relevant academic paper? (if so, where): no
- Does the relavent academic paper exceed 50 citations? (yes/no): no
- Is there already an implementation in another framework? (if so, where): don't know
- Was it part of tf.contrib? (if so, where): no
Which API type would this fall under (layer, metric, optimizer, etc.) Layer & op.
Who will benefit with this feature? This will extend layer usage to a large number of NLP tasks (which use RaggedTensors in most cases).
We have discussed a little bit about the sparsity at:
https://github.com/tensorflow/addons/pull/2352#issuecomment-763042721
/cc @tanguycdls @aartbik
Hello ! +1 for that issue, We migrated from Torch to Tensorflow and we're also missing the EmbeddingBag op https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html that takes as input a ragged input instead of a dense one. We started by using safe_embedding_lookup_sparse but since we were only using the sum aggregator we recently moved to using a sparse dense matmul instead to reduce ram consomption: the bottleneck is the conversion from the ragged format to indicator COO format. We could share our implem once its finished and cleaned !
@bhack @shkarupa-alex are you still interested by this ? if it's the case we can open a pr and discuss if it should be a new op or an improvement over the Embedding Bag merged here: https://github.com/tensorflow/addons/pull/2352
As many other reusable NLP components are starting to land in https://github.com/keras-team/keras-nlp/pull/10 you could try to open a ticket there to check if they are interested.
@bhack @shkarupa-alex are you still interested by this ?
Yes, still interested
TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision: TensorFlow Addons Wind Down
Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA: Keras Keras-CV Keras-NLP