models icon indicating copy to clipboard operation
models copied to clipboard

[Task] Use `embedding_sizes.cardinality` instead of `int_domain` to compute embedding cardinality

Open oliverholworthy opened this issue 3 years ago • 0 comments

Description

We have been using the int_domain.max property of the ColumnSchema to compute embedding cardinalities.

This task proposes to replace the usage of int_domain.max to infer the cardinality with the properties.embedding_sizes.cardinality from the ColumnSchema.

This was motivated by reviewing the new EmbeddingTable usage and trying to create examples with pre-trained weights. We identified an off-by-one error in the int_domain.max property. To be updated in: https://github.com/NVIDIA-Merlin/NVTabular/pull/1641

Current Usage

In the categorical_cardinalities function:

https://github.com/NVIDIA-Merlin/models/blob/62145d678b90c75edb242e41dd0cc199e66ba800/merlin/models/utils/schema_utils.py#L123-L131

Which is used by:

And the EmbeddingTable:

https://github.com/NVIDIA-Merlin/models/blob/c63daf65efc653cd84cc32ba64c8ce8f110af220/merlin/models/tf/inputs/embedding.py#L82-L84

oliverholworthy avatar Aug 10 '22 12:08 oliverholworthy