tensordict
tensordict copied to clipboard
[Feature Request] Array of dicts of tensors structure
Storing them in "columnar format" can be more compact in some circumstances (e.g. they are copy-on-write safe in multiprocessing context as they are stored as a very small number of tensors not depending on "dataset" size): https://gist.github.com/vadimkantorov/86c3a46bf25bed3ad45d043ae86fff57
Thanks for this @vadimkantorov How do you see this interacting with TensorDict? Should Arrays of dicts be a possible data type stored by tensordict? Do you have a typical use case in mind?
I don't know much about TensorDcit project. I just wanted to share a usecase I had for dicts of tensors: represent a dataset in a way that avoids copy-on-write problems: https://github.com/pytorch/pytorch/issues/13246
I represented this array of dicts of tensors as columnar dict of tensors, each key is a tensor that concats all per-items tensors related to a given key.
One way it could integrate with TensorDict: provide a constructor/util function and a "indexing/getitem" method/util that would do slicing of all keys in TensorDict and return a new, "per-item" TensorDict. These may be just recipes from docs or util functions + tests that no copy-on-write/memory expansion is indeed happening, and such structure is safely shared in multiprocessing/dataloading without any copies
Also, similar usecase may be for collecting some partial results from validation loop. Usually, one would store them in a list of dicts of tensors and then analyze it somehow. If such a structure is implemented in some extendable way (as proposed here: https://github.com/pytorch/pytorch/issues/64359), it could be useful
Also, similar usecase may be for collecting some partial results from validation loop. Usually, one would store them in a list of dicts of tensors and then analyze it somehow. If such a structure is implemented in some extendable way (as proposed here: https://github.com/pytorch/pytorch/issues/64359), it could be useful
That is something we have I think.
Here's an example:
>>> tensordict1 = TensorDict({"a": torch.zeros(1, 1)}, [1])
>>> tensordict2 = TensorDict({"a": torch.ones(1, 1)}, [1])
>>> tensordict = torch.stack([tensordict1, tensordict2], 0)
>>>
>>> tensordict
LazyStackedTensorDict(
fields={
a: Tensor(torch.Size([2, 1, 1]), dtype=torch.float32)},
batch_size=torch.Size([2, 1]),
device=None,
is_shared=False)
>>>
>>> tensordict[0] is tensordict1
True
>>> tensordict["a"]
tensor([[[0.]],
[[1.]]])
The LazyStackedTensorDict does not currently support appending but we might consider that.