addons icon indicating copy to clipboard operation
addons copied to clipboard

EmbeddingBag for ragged indices

Open shkarupa-alex opened this issue 4 years ago • 5 comments

Describe the feature and the current behavior/state. Currently EmbeddingBag and underlying op works only with dense tensors. But a lot of nlp ops/tasks in modern require to use RaggedTensors. Converting them to Dense will cause performance downgrade.

For example, let's take FastText ngram method. We have a batch of words [BATCH]. We want to split them into ngrams [BATCH, NGRAMS]. Then we look them up in vocabulary to obtain indices [BATCH, NGRAM_INDICES]. Next we want to obtain embeddings and reduce them with sum/mean/etc. In this case ngrams and indices are ragged tensors, so it would be cool if we can use EmbeddingBag for the last two operations (embed + reduce).

Relevant information

  • Are you willing to contribute it (yes/no): no
  • Are you willing to maintain it going forward? (yes/no): no
  • Is there a relevant academic paper? (if so, where): no
  • Does the relavent academic paper exceed 50 citations? (yes/no): no
  • Is there already an implementation in another framework? (if so, where): don't know
  • Was it part of tf.contrib? (if so, where): no

Which API type would this fall under (layer, metric, optimizer, etc.) Layer & op.

Who will benefit with this feature? This will extend layer usage to a large number of NLP tasks (which use RaggedTensors in most cases).

shkarupa-alex avatar Aug 19 '21 11:08 shkarupa-alex

We have discussed a little bit about the sparsity at:

https://github.com/tensorflow/addons/pull/2352#issuecomment-763042721

/cc @tanguycdls @aartbik

bhack avatar Aug 19 '21 14:08 bhack

Hello ! +1 for that issue, We migrated from Torch to Tensorflow and we're also missing the EmbeddingBag op https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html that takes as input a ragged input instead of a dense one. We started by using safe_embedding_lookup_sparse but since we were only using the sum aggregator we recently moved to using a sparse dense matmul instead to reduce ram consomption: the bottleneck is the conversion from the ragged format to indicator COO format. We could share our implem once its finished and cleaned !

tanguycdls avatar Aug 23 '21 07:08 tanguycdls

@bhack @shkarupa-alex are you still interested by this ? if it's the case we can open a pr and discuss if it should be a new op or an improvement over the Embedding Bag merged here: https://github.com/tensorflow/addons/pull/2352

tanguycdls avatar Jan 21 '22 14:01 tanguycdls

As many other reusable NLP components are starting to land in https://github.com/keras-team/keras-nlp/pull/10 you could try to open a ticket there to check if they are interested.

bhack avatar Jan 21 '22 14:01 bhack

@bhack @shkarupa-alex are you still interested by this ?

Yes, still interested

shkarupa-alex avatar Jan 31 '22 06:01 shkarupa-alex

TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision: TensorFlow Addons Wind Down

Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA: Keras Keras-CV Keras-NLP

seanpmorgan avatar Mar 01 '23 05:03 seanpmorgan