k2
k2 copied to clipboard
WIP: Add tutorials about ragged tensors.
A preview can be found at
https://csukuangfj.github.io/k2/python_tutorials/ragged/basics.html#
Looks cool!
On Sat, Sep 11, 2021 at 5:56 PM Fangjun Kuang @.***> wrote:
A preview can be found at
https://csukuangfj.github.io/k2/python_tutorials/ragged/basics.html#
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/k2/pull/823#issuecomment-917378777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3YI3NPQ22GMIDTWETUBMRTVANCNFSM5D2ZJRUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
@csukuangfj Thanks for this tutorial! Could you please clarify how ragged tensors relate to, say, PyTorch sparse matrices? They look quite similar.
@csukuangfj Thanks for this tutorial! Could you please clarify how ragged tensors relate to, say, PyTorch sparse matrices? They look quite similar.
TensorFlow has sparse matrices and ragged tensors, see
- https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor
- https://www.tensorflow.org/api_docs/python/tf/RaggedTensor
PyTorch also has sparse matrices and nested tensors, see
- https://github.com/pytorch/nestedtensor
- https://pytorch.org/docs/stable/sparse.html and https://github.com/Quansight-Labs/rfcs/tree/pearu/rfc0005/RFC0003-sparse-roadmap
We use the same terminology, i.e., row splits, row ids, etc, as the one used in tf.RaggedTensor
, though ragged tensors in k2 were designed by @danpovey independently. We were later told that TensorFlow was using the same ideas.
A ragged tensor with 2 axes looks similar to a sparse matrix in CSR format, but they are different.
From https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) , a sparse matrix in CSR format has the following components:
-
ROW_INDEX
-
COL_INDEX
-
V
The ROW_INDEX
is called row_splits
in k2 and V
is called values
in k2. That's why I said a ragged tensor in k2
shares some similarities with sparse matrices.
However, there is no COL_INDEX
in ragged tensors. We are not viewing a ragged tensor as a ragged matrix.
For a ragged tensor of 2 axes, what we care about is the number of elements in each row, we don't assign a column index to entries in a row.
PyTorch's sparse matrices use COO format. But anyway, they are still matrices with row indexes and column indexes.
Also, ragged tensors in k2 are not designed for linear algebra operations, i.e., there are no matrix-vector or matrix-matrix multiplications. Instead, they are designed for efficiently manipulating irregular data structures on GPU.
Many thanks for the clarification!
A humble suggestion: you might consider including this information in the tutorial because I am hardly the last person to ask questions like this.