Ben Frederickson
Ben Frederickson
@PerkzZheng sorry we missed you on this one - can you test on the latest version, and if this is still an issue we'll dig in?
We also had a previous issue talking about this here https://github.com/NVIDIA-Merlin/NVTabular/issues/250
@bschifferer, as a short-term workaround - can you try this instead to explicitly set the dtype in the LambdaOp? ```python col_cat_int8 = col_cat_int8 >> nvt.ops.Categorify() >> nvt.ops.LambdaOp(lambda x: x.astype('int8'), dtype="int8")...
We've talked to people that have set this up successfully, but need some documentation updates to indicate how to set this up - and what the limitations are in terms...
For estimating this using parquet metadata, here is a hacky proof of concept showing that we can detect high cardinality columns by looking at the parquet dictionary size in bytes:...
**Comment by [benfred](https://github.com/benfred)** _Monday May 11, 2020 at 16:13 GMT_ ---- +1 to both of these
Can you rebase your branch off of the main branch? I'm seeing a bunch of commits that are unrelated to this PR
I just tried replicating this on the `nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.05` and on the `nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.06` containers, and both example snippets worked for me without seeing the same error =(. Do you have a...
I tried out the install script here https://gitlab-master.nvidia.com/eordentlich/criteo-demo-local/-/blob/main/azure-vm/conda-env-setup.sh on my own local dev machine, as well as on top of the merlin-tensorflow:22.06 container - and still couldn't reproduce =(.
@eordentlich - thanks for the tip about the container, I've managed to replicate this now