[QST] The batch generation in MovieLens example produces batches in an unexpected way

Open zainkhan-afk opened this issue 7 months ago • 2 comments

❓ Questions & Help

Details

I was running the Getting Started With MovieLens example for pytorch and when I created the dataloader and generated a single batch I got a different output than the the one that was expected:

I got:

({'userId': tensor([ 8528, 39453, 50328,  ..., 59406, 59579, 12128], device='cuda:0'),
  'movieId': tensor([1175,  387,   12,  ...,   23,  934, 1738], device='cuda:0'),
  'genres__values': tensor([5, 6, 5,  ..., 9, 5, 3], device='cuda:0'),
  'genres__offsets': tensor([    0,     2,     4,  ..., 88830, 88833, 88835], device='cuda:0',
         dtype=torch.int32)},
 tensor([0., 0., 1.,  ..., 1., 1., 1.], device='cuda:0'))

Expected:

({'genres': (tensor([1, 2, 6,  ..., 8, 1, 4], device='cuda:0'),
   tensor([[    0],
           [    1],
           [    3],
           ...,
           [88555],
           [88556],
           [88557]], device='cuda:0', dtype=torch.int32)),
  'userId': tensor([[1691],
          [1001],
          [ 967],
          ...,
          [ 848],
          [1847],
          [5456]], device='cuda:0'),
  'movieId': tensor([[ 332],
          [ 154],
          [ 245],
          ...,
          [3095],
          [1062],
          [3705]], device='cuda:0')},
 tensor([1., 1., 0.,  ..., 1., 1., 0.], device='cuda:0'))

Docker: nvcr.io/nvidia/merlin/merlin-pytorch:nightly Notebook: 03-Training-with-PyTorch.ipynb

I read in the documentation that a multicoded object will have two tensors (value and nnzs), in my case I the two tensors are being added against different keys rather than being added as a tuple against a single key.

How can I get the batches in the required format?

Jul 02 '24 11:07 zainkhan-afk

Merlin Merlin copied to clipboard

[QST] The batch generation in MovieLens example produces batches in an unexpected way

❓ Questions & Help

Details

Merlin
Merlin copied to clipboard