k2 icon indicating copy to clipboard operation
k2 copied to clipboard

WIP: Add tutorials about ragged tensors.

Open csukuangfj opened this issue 3 years ago • 5 comments

csukuangfj avatar Sep 11 '21 09:09 csukuangfj

A preview can be found at

https://csukuangfj.github.io/k2/python_tutorials/ragged/basics.html#

csukuangfj avatar Sep 11 '21 09:09 csukuangfj

Looks cool!

On Sat, Sep 11, 2021 at 5:56 PM Fangjun Kuang @.***> wrote:

A preview can be found at

https://csukuangfj.github.io/k2/python_tutorials/ragged/basics.html#

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/pull/823#issuecomment-917378777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3YI3NPQ22GMIDTWETUBMRTVANCNFSM5D2ZJRUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

danpovey avatar Sep 11 '21 14:09 danpovey

@csukuangfj Thanks for this tutorial! Could you please clarify how ragged tensors relate to, say, PyTorch sparse matrices? They look quite similar.

GNroy avatar Sep 13 '21 17:09 GNroy

@csukuangfj Thanks for this tutorial! Could you please clarify how ragged tensors relate to, say, PyTorch sparse matrices? They look quite similar.

TensorFlow has sparse matrices and ragged tensors, see

  • https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor
  • https://www.tensorflow.org/api_docs/python/tf/RaggedTensor

PyTorch also has sparse matrices and nested tensors, see

  • https://github.com/pytorch/nestedtensor
  • https://pytorch.org/docs/stable/sparse.html and https://github.com/Quansight-Labs/rfcs/tree/pearu/rfc0005/RFC0003-sparse-roadmap

We use the same terminology, i.e., row splits, row ids, etc, as the one used in tf.RaggedTensor, though ragged tensors in k2 were designed by @danpovey independently. We were later told that TensorFlow was using the same ideas.


A ragged tensor with 2 axes looks similar to a sparse matrix in CSR format, but they are different.

From https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) , a sparse matrix in CSR format has the following components:

  • ROW_INDEX
  • COL_INDEX
  • V

The ROW_INDEX is called row_splits in k2 and V is called values in k2. That's why I said a ragged tensor in k2 shares some similarities with sparse matrices.

However, there is no COL_INDEX in ragged tensors. We are not viewing a ragged tensor as a ragged matrix. For a ragged tensor of 2 axes, what we care about is the number of elements in each row, we don't assign a column index to entries in a row.

PyTorch's sparse matrices use COO format. But anyway, they are still matrices with row indexes and column indexes.


Also, ragged tensors in k2 are not designed for linear algebra operations, i.e., there are no matrix-vector or matrix-matrix multiplications. Instead, they are designed for efficiently manipulating irregular data structures on GPU.

csukuangfj avatar Sep 14 '21 04:09 csukuangfj

Many thanks for the clarification!

A humble suggestion: you might consider including this information in the tutorial because I am hardly the last person to ask questions like this.

GNroy avatar Sep 14 '21 07:09 GNroy