Merlin
Merlin copied to clipboard
[QST] The batch generation in MovieLens example produces batches in an unexpected way
❓ Questions & Help
Details
I was running the Getting Started With MovieLens example for pytorch and when I created the dataloader and generated a single batch I got a different output than the the one that was expected:
I got:
({'userId': tensor([ 8528, 39453, 50328, ..., 59406, 59579, 12128], device='cuda:0'),
'movieId': tensor([1175, 387, 12, ..., 23, 934, 1738], device='cuda:0'),
'genres__values': tensor([5, 6, 5, ..., 9, 5, 3], device='cuda:0'),
'genres__offsets': tensor([ 0, 2, 4, ..., 88830, 88833, 88835], device='cuda:0',
dtype=torch.int32)},
tensor([0., 0., 1., ..., 1., 1., 1.], device='cuda:0'))
Expected:
({'genres': (tensor([1, 2, 6, ..., 8, 1, 4], device='cuda:0'),
tensor([[ 0],
[ 1],
[ 3],
...,
[88555],
[88556],
[88557]], device='cuda:0', dtype=torch.int32)),
'userId': tensor([[1691],
[1001],
[ 967],
...,
[ 848],
[1847],
[5456]], device='cuda:0'),
'movieId': tensor([[ 332],
[ 154],
[ 245],
...,
[3095],
[1062],
[3705]], device='cuda:0')},
tensor([1., 1., 0., ..., 1., 1., 0.], device='cuda:0'))
Docker: nvcr.io/nvidia/merlin/merlin-pytorch:nightly Notebook: 03-Training-with-PyTorch.ipynb
I read in the documentation that a multicoded object will have two tensors (value and nnzs), in my case I the two tensors are being added against different keys rather than being added as a tuple against a single key.
How can I get the batches in the required format?