torchrec icon indicating copy to clipboard operation
torchrec copied to clipboard

[Bug Find] sparse feature values type mismatch for Inference library

Open yinpeiqi opened this issue 1 year ago • 0 comments

In /examples/inference/dlrm_client.py, the sparse features values are parsed as Int64 type.

    id_list_features = predictor_pb2.SparseFeatures(
        num_features=args.num_id_list_features,
        values=to_bytes(batch.sparse_features.values()),
        lengths=to_bytes(batch.sparse_features.lengths()),
    )

However, in torchrec/inference/src/Batching.cpp line 171 and line 208, the combineSparse function regard sparse feature values as int32 type.

auto values = at::empty({totalLength}, options.dtype(at::kInt));
...
len = featureLengths[j][i] * sizeof(int32_t);
valuesCursor[j].pull(valuesRange.data(), len);
valuesRange.advance(len);

Which regard int64 type as int32 type. If we print the value here, we will get:

...
  5012                                                                                                                                        
     0                                                                 
  9017                                                                 
     0                                                                 
 72546                                                                 
     0                                                                 
 63898                                                                 
     0                                                                 
 61197                                                                 
     0                                                                 
 31162                                                                 
     0                                                                 
  2567                                                                 
     0                                                                 
 89318                                                                 
     0                                                                 
 79668                                                                 
     0    
...

Which only takes half values in input.

yinpeiqi avatar Dec 15 '23 05:12 yinpeiqi