NVTabular Allow disabling strict checking of Workflow operator outputs vs schema

This allows disabling strict checking of operator outputs against operator output schemas, which may allow some Workflows to run in cases where they otherwise wouldn't. This TypeError was introduced in order to catch cases where operators were fibbing about their own output, but we haven't fully been able to resolve all of those cases, so this provides an escape hatch that may help in some cases. Setting strict=False here is an acknowledgement that downstream code (e.g. in Merlin Models) may not run as expected in all cases (for example due to dtype discrepancies), but I suspect we may still be okay in many cases, so this gives another option to try.

Related to #1580 (though not an actual fix for the issue reported therein)

Jun 16 '22 19:06 karlhigley

I appreciate the help. Thank you.

Jun 16 '22 19:06 mikemckiernan

Click to view CI Results

GitHub pull request #1589 of commit 874ce3c4013a44b7acd02b847db6e1096ab4bbd2, no merge conflicts.
Running as SYSTEM
Setting status of 874ce3c4013a44b7acd02b847db6e1096ab4bbd2 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4520/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10
 > git rev-parse 874ce3c4013a44b7acd02b847db6e1096ab4bbd2^{commit} # timeout=10
Checking out Revision 874ce3c4013a44b7acd02b847db6e1096ab4bbd2 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 874ce3c4013a44b7acd02b847db6e1096ab4bbd2 # timeout=10
Commit message: "Allow disabling strict checking of Workflow operator outputs vs schema"
 > git rev-list --no-walk cb0d6226fff0e32e2c0ff3772f84f9259410f663 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins11716359310313667117.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1421 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]
........................................................................ [  8%]
[  8%]
tests/unit/test_notebooks.py ......                                      [  8%]
tests/unit/test_tf4rec.py .                                              [  8%]
tests/unit/test_tools.py ......................                          [ 10%]
tests/unit/test_triton_inference.py ................................     [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
...................................................                      [ 18%]
tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]
tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s..                              [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
......................................................                   [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
...........................................                              [ 40%]
tests/unit/ops/test_column_similarity.py ........................        [ 42%]
tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........                                                                 [ 45%]
tests/unit/ops/test_groupyby.py ...............                          [ 46%]
tests/unit/ops/test_hash_bucket.py .........................             [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
..................................                                       [ 59%]
tests/unit/ops/test_lambda.py ..........                                 [ 59%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
..                                                                       [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
....................                                                     [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
.......................................                                  [ 88%]
tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]
tests/unit/ops/test_target_encode.py .....................               [ 89%]
tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
...........................................................              [ 96%]
tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]
tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]
tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
...                                                                      [100%]
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future
warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 1420 passed, 2 skipped, 2343 warnings in 705.42s (0:11:45) ==========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins15053457041468847894.sh

Jun 16 '22 19:06 nvidia-merlin-bot

Fundamentally, this change is awesome 🙂 I really like this way of thinking where we are saying "okay, feel free to use the underlying machinery at your own risk. We provide you all this functionality, but if you want to step off the beaten track, feel free to do so". We are dealing with the fact that NVTabular is built on top of these other libraries very elegantly here.

So it would be great for this change to be merged!

But if the #1580 would get merged at some point (which I also think is important, we are significantly expanding the set of aggregations that we support and cast floats64 to floats32 which makes a lot of sense, it's easy to eat up a lot of memory with these aggregations really quickly), so if #1580 gets merged, the unit test here will stop raising an error and will fail.

ATM I don't see a good way to test the functionality in this PR (unless we find some other operator or aggregation that we currently don't support and don't plan to support in near future, or can play a trick on the operator and amend the schema mid-flight). In fact, same goes for #1580 -- I think the functionality is good there, it is just the additional test that is problematic.

Jun 17 '22 07:06 radekosmulski

Maybe even with #1580 getting merged this could be tested via doing a list aggregation of floats32 on the CPU? 🤔 This would be a genuine scenario that I am aware of which wouldn't run without strict=False.

Jun 17 '22 09:06 radekosmulski

Click to view CI Results

GitHub pull request #1589 of commit 7faf728e4d01f69d4c721fccf35c05329799cd8e, no merge conflicts.
Running as SYSTEM
Setting status of 7faf728e4d01f69d4c721fccf35c05329799cd8e to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4528/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10
 > git rev-parse 7faf728e4d01f69d4c721fccf35c05329799cd8e^{commit} # timeout=10
Checking out Revision 7faf728e4d01f69d4c721fccf35c05329799cd8e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7faf728e4d01f69d4c721fccf35c05329799cd8e # timeout=10
Commit message: "Merge branch 'main' into feature/non-strict-mode"
 > git rev-list --no-walk 645c08ed6887ab7e6d43a1de3407654e5cb16bc3 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins12907291940630881924.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1421 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]
........................................................................ [  8%]
[  8%]
tests/unit/test_notebooks.py ......                                      [  8%]
tests/unit/test_tf4rec.py .                                              [  8%]
tests/unit/test_tools.py ......................                          [ 10%]
tests/unit/test_triton_inference.py ................................     [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
...................................................                      [ 18%]
tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]
tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s..                              [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
......................................................                   [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
...........................................                              [ 40%]
tests/unit/ops/test_column_similarity.py ........................        [ 42%]
tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........                                                                 [ 45%]
tests/unit/ops/test_groupyby.py ...............                          [ 46%]
tests/unit/ops/test_hash_bucket.py .........................             [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
..................................                                       [ 59%]
tests/unit/ops/test_lambda.py ..........                                 [ 59%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
..                                                                       [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
....................                                                     [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
.......................................                                  [ 88%]
tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]
tests/unit/ops/test_target_encode.py .....................               [ 89%]
tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
...........................................................              [ 96%]
tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]
tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]
tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
...                                                                      [100%]
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future
warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 1420 passed, 2 skipped, 2343 warnings in 703.33s (0:11:43) ==========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins11124367831153039112.sh

Jun 17 '22 13:06 nvidia-merlin-bot

@radekosmulski Thanks for the comments, I think you're entirely on target there! Re: PR etiquette etc, I'm really digging your thorough and well-thought out comments. The only thing I'd mention is that for small changes, you can create suggestions from the comment box with the button that has the "piece of paper plus/minus" icon, which lets you directly propose code changes the author can incorporate with a button click. I used it to make the change you proposed by creating a suggestion on my own PR, but we could skip a step. 😺

Jun 17 '22 17:06 karlhigley

Click to view CI Results

GitHub pull request #1589 of commit 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5, no merge conflicts.
Running as SYSTEM
Setting status of 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4530/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10
 > git rev-parse 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5^{commit} # timeout=10
Checking out Revision 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5 # timeout=10
Commit message: "Update tests/unit/workflow/test_workflow.py"
 > git rev-list --no-walk 621f4d729adae51108254981590a6728c077ca95 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins4694174338352369481.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1421 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]
........................................................................ [  8%]
[  8%]
tests/unit/test_notebooks.py ......                                      [  8%]
tests/unit/test_tf4rec.py .                                              [  8%]
tests/unit/test_tools.py ......................                          [ 10%]
tests/unit/test_triton_inference.py ................................     [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
..................................................F                      [ 18%]
tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]
tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s..                              [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
......................................................                   [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
...........................................                              [ 40%]
tests/unit/ops/test_column_similarity.py ........................        [ 42%]
tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........                                                                 [ 45%]
tests/unit/ops/test_groupyby.py ...............                          [ 46%]
tests/unit/ops/test_hash_bucket.py .........................             [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
..................................                                       [ 59%]
tests/unit/ops/test_lambda.py ..........                                 [ 59%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
..                                                                       [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
....................                                                     [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
.......................................                                  [ 88%]
tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]
tests/unit/ops/test_target_encode.py .....................               [ 89%]
tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
..........................................................F              [ 96%]
tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]
tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]
tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
...                                                                      [100%]
=================================== FAILURES ===================================
___________________________ test_multihot_empty_rows ___________________________
def test_multihot_empty_rows():
    multi_hot = tf.feature_column.categorical_column_with_identity("multihot", 5)
    multi_hot_embedding = tf.feature_column.embedding_column(multi_hot, 8, combiner="sum")

    embedding_layer = layers.DenseFeatures([multi_hot_embedding])
    inputs = {
        "multihot": (
            tf.keras.Input(name="multihot__values", shape=(1,), dtype=tf.int64),
            tf.keras.Input(name="multihot__nnzs", shape=(1,), dtype=tf.int64),
        )
    }
    output = embedding_layer(inputs)

    model = tf.keras.Model(inputs=inputs, outputs=output)
    model.compile("sgd", "binary_crossentropy")

    multi_hot_values = np.array([0, 2, 1, 4, 1, 3, 1])
    multi_hot_nnzs = np.array([1, 0, 2, 4, 0])
    x = {"multihot": (multi_hot_values[:, None], multi_hot_nnzs[:, None])}

    multi_hot_embedding_table = embedding_layer.embedding_tables["multihot"].numpy()
    multi_hot_embedding_rows = _compute_expected_multi_hot(
        multi_hot_embedding_table, multi_hot_values, multi_hot_nnzs, "sum"
    )

    y_hat = model(x).numpy()


  np.testing.assert_allclose(y_hat, multi_hot_embedding_rows, rtol=1e-06)


E       AssertionError:
E       Not equal to tolerance rtol=1e-06, atol=0
E

E       Mismatched elements: 1 / 40 (2.5%)
E       Max absolute difference: 1.4901161e-08
E       Max relative difference: 1.2405193e-06
E        x: array([[ 0.313274,  0.480283,  0.080427,  0.726971,  0.251228, -0.131764,
E                0.102429,  0.482547],
E              [ 0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,...
E        y: array([[ 0.313274,  0.480283,  0.080427,  0.726971,  0.251228, -0.131764,
E                0.102429,  0.482547],
E              [ 0.      ,  0.      ,  0.      ,  0.      ,  0.      ,  0.      ,...
tests/unit/framework_utils/test_tf_layers.py:321: AssertionError
______________________ test_workflow_strict_mode_disabled ______________________
def test_workflow_strict_mode_disabled():
    df = make_df({"cat": ["a", "a", "b"], "timestamp": [1, 2, 1], "measurement": [0.1, 0.2, 0.5]})
    df["measurement"] = df["measurement"].astype("float32")

    grouped = ["measurement", "cat"] >> ops.Groupby("cat", aggs=["std"])
    workflow = Workflow(grouped)

    dataset = Dataset(df, cpu=True)
    result = workflow.fit_transform(dataset, strict=True)

    # Strict mode should catch the dtype discrepancy
    # between the schema and the output (float32 vs float64)
    with pytest.raises(TypeError):
        result.compute()

    # Disabling strict mode should allow the workflow to run
    result_ddf = workflow.fit_transform(dataset, strict=False)
    result = result_ddf.compute()


  assert result


tests/unit/workflow/test_workflow.py:691:

self =   cat  measurement_std
0   a         0.070711
1   b              NaN
@final
def __nonzero__(self):


  raise ValueError(


        f"The truth value of a {type(self).__name__} is ambiguous. "
        "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
    )

E       ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
../../../.local/lib/python3.8/site-packages/pandas/core/generic.py:1537: ValueError
----------------------------- Captured stderr call -----------------------------
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.groupby.Groupby object at 0x7fef867bb5e0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 554, in _transform_partition
raise TypeError(
TypeError: Dtype discrepancy detected for column measurement_std: operator Groupby reported dtype float32 but returned dtype float64.
distributed.worker - WARNING - Compute Failed
Function:  subgraph_callable-e976f4c4-a80b-49c3-8e38-fee83692
args:      (  cat  timestamp  measurement
0   a          1          0.1
1   a          2          0.2
2   b          1          0.5)
kwargs:    {}
Exception: "TypeError('Dtype discrepancy detected for column measurement_std: operator Groupby reported dtype float32 but returned dtype float64.')"
--------------------------- Captured stderr teardown ---------------------------
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future
warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/framework_utils/test_tf_layers.py::test_multihot_empty_rows
FAILED tests/unit/workflow/test_workflow.py::test_workflow_strict_mode_disabled
===== 2 failed, 1418 passed, 2 skipped, 2343 warnings in 707.65s (0:11:47) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins14757039273063764355.sh

Jun 17 '22 17:06 nvidia-merlin-bot

Click to view CI Results

GitHub pull request #1589 of commit 793a2f61723591752073ae837d53163393294247, no merge conflicts.
Running as SYSTEM
Setting status of 793a2f61723591752073ae837d53163393294247 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4531/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10
 > git rev-parse 793a2f61723591752073ae837d53163393294247^{commit} # timeout=10
Checking out Revision 793a2f61723591752073ae837d53163393294247 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 793a2f61723591752073ae837d53163393294247 # timeout=10
Commit message: "Make the assertion more explicit"
 > git rev-list --no-walk 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8127438609129841754.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1421 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]
........................................................................ [  8%]
[  8%]
tests/unit/test_notebooks.py ......                                      [  8%]
tests/unit/test_tf4rec.py .                                              [  8%]
tests/unit/test_tools.py ......................                          [ 10%]
tests/unit/test_triton_inference.py ..............................FF     [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
...................................................                      [ 18%]
tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]
tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s..                              [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
......................................................                   [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
...........................................                              [ 40%]
tests/unit/ops/test_column_similarity.py ........................        [ 42%]
tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........                                                                 [ 45%]
tests/unit/ops/test_groupyby.py ...............                          [ 46%]
tests/unit/ops/test_hash_bucket.py .........................             [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
..................................                                       [ 59%]
tests/unit/ops/test_lambda.py ..........                                 [ 59%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
..                                                                       [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
....................                                                     [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
.......................................                                  [ 88%]
tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]
tests/unit/ops/test_target_encode.py .....................               [ 89%]
tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
...........................................................              [ 96%]
tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]
tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]
tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
...                                                                      [100%]
=================================== FAILURES ===================================
_________________________ test_groupby_model[pytorch] __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-10/test_groupby_model_pytorch_0')
output_model = 'pytorch'
@pytest.mark.skipif(TRITON_SERVER_PATH is None, reason="Requires tritonserver on the path")
@pytest.mark.parametrize("output_model", ["tensorflow", "pytorch"])
def test_groupby_model(tmpdir, output_model):
    size = 20
    df = make_df(
        {
            "id": np.random.choice([0, 1], size=size),
            "ts": np.linspace(0.0, 10.0, num=size),
            "x": np.arange(size),
            "y": np.linspace(0.0, 10.0, num=size),
        }
    )

    groupby_features = ColumnSelector(["id", "ts", "x", "y"]) >> ops.Groupby(
        groupby_cols=["id"],
        sort_cols=["ts"],
        aggs={
            "x": ["sum"],
            "y": ["first"],
        },
        name_sep="-",
    )
    workflow = nvt.Workflow(groupby_features)


  _verify_workflow_on_tritonserver(


        tmpdir, workflow, df, "groupby", output_model, cats=["id", "y-first"], conts=["x-sum"]
    )

tests/unit/test_triton_inference.py:379:

tests/unit/test_triton_inference.py:112: in _verify_workflow_on_tritonserver
response = client.infer(model_name, inputs, outputs=outputs)
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1295: in infer
raise_error_grpc(rpc_error)

rpc_error = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Failed to process the reques...he request(s) for model instance 'groupby', message: Failed to fetch the error in response batch.","grpc_status":13}"


def raise_error_grpc(rpc_error):


  raise get_error_grpc(rpc_error) from None


E       tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'groupby', message: Failed to fetch the error in response batch.
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62: InferenceServerException
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
I0617 17:42:23.982985 26724 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0617 17:42:23.983100 26724 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0617 17:42:23.983107 26724 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0617 17:42:23.983113 26724 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0617 17:42:24.180156 26724 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fa0a6000000' with size 268435456
I0617 17:42:24.181132 26724 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0617 17:42:24.184717 26724 model_repository_manager.cc:997] loading: groupby:1
I0617 17:42:24.292229 26724 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: groupby (GPU device 0)
I0617 17:42:26.782464 26724 model_repository_manager.cc:1152] successfully loaded 'groupby' version 1
I0617 17:42:26.782635 26724 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0617 17:42:26.782756 26724 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend    | Path                                                            | Config                      |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| python     | /opt/tritonserver/backends/python/libtriton_python.so           | {}                          |
+------------+-----------------------------------------------------------------+-----------------------------+
I0617 17:42:26.782813 26724 server.cc:594]
+---------+---------+--------+
| Model   | Version | Status |
+---------+---------+--------+
| groupby | 1       | READY  |
+---------+---------+--------+
I0617 17:42:26.832667 26724 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0617 17:42:26.835524 26724 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                        |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                       |
| server_version                   | 2.20.0                                                                                                                                                                                       |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0]         | /tmp/pytest-of-jenkins/pytest-10/test_groupby_model_pytorch_0                                                                                                                                |
| model_control_mode               | MODE_NONE                                                                                                                                                                                    |
| strict_model_config              | 1                                                                                                                                                                                            |
| rate_limit                       | OFF                                                                                                                                                                                          |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |
| response_cache_byte_size         | 0                                                                                                                                                                                            |
| min_supported_compute_capability | 6.0                                                                                                                                                                                          |
| strict_readiness                 | 1                                                                                                                                                                                            |
| exit_timeout                     | 30                                                                                                                                                                                           |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0617 17:42:26.836594 26724 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001
I0617 17:42:26.836798 26724 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000
I0617 17:42:26.877737 26724 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
W0617 17:42:27.854774 26724 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
W0617 17:42:28.854960 26724 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
0617 17:42:29.721719 26740 pb_stub.cc:419] Failed to process the request(s) for model 'groupby', message: AttributeError: 'NoneType' object has no attribute 'as_numpy'
At:
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/inference/triton/init.py(76): _convert_tensor
/tmp/pytest-of-jenkins/pytest-10/test_groupby_model_pytorch_0/groupby/1/model.py(105): 
/tmp/pytest-of-jenkins/pytest-10/test_groupby_model_pytorch_0/groupby/1/model.py(104): execute
I0617 17:42:29.722352 26724 server.cc:252] Waiting for in-flight requests to complete.
I0617 17:42:29.722387 26724 model_repository_manager.cc:1029] unloading: groupby:1
I0617 17:42:29.722500 26724 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
W0617 17:42:29.873437 26724 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
I0617 17:42:30.722598 26724 server.cc:267] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I0617 17:42:31.075354 26724 model_repository_manager.cc:1135] successfully unloaded 'groupby' version 1
I0617 17:42:31.722724 26724 server.cc:267] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
______________________ test_seq_etl_tf_model[tensorflow] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-10/test_seq_etl_tf_model_tensorfl0')
output_model = 'tensorflow'
@pytest.mark.skipif(TRITON_SERVER_PATH is None, reason="Requires tritonserver on the path")
@pytest.mark.parametrize("output_model", ["tensorflow"])
def test_seq_etl_tf_model(tmpdir, output_model):
    size = 100
    max_length = 10
    df = make_df(
        {
            "id": np.random.choice([0, 1], size=size),
            "item_id": np.random.randint(1, 10, size),
            "ts": np.linspace(0.0, 10.0, num=size).astype(np.float32),
            "y": np.linspace(0.0, 10.0, num=size).astype(np.float32),
        }
    )

    groupby_features = ColumnSelector(["id", "item_id", "ts", "y"]) >> ops.Groupby(
        groupby_cols=["id"],
        sort_cols=["ts"],
        aggs={
            "item_id": ["list"],
            "y": ["list"],
        },
        name_sep="-",
    )
    feats_list = groupby_features["item_id-list", "y-list"]
    feats_trim = feats_list >> ops.ListSlice(0, max_length, pad=True)
    selected_features = groupby_features["id"] + feats_trim

    workflow = nvt.Workflow(selected_features)

    sparse_max = {"item_id-list": max_length, "y-list": max_length}


  _verify_workflow_on_tritonserver(


        tmpdir,
        workflow,
        df,
        "groupby",
        output_model,
        sparse_max,
        cats=["id", "item_id-list"],
        conts=["y-list"],
    )

tests/unit/test_triton_inference.py:415:

tests/unit/test_triton_inference.py:112: in _verify_workflow_on_tritonserver
response = client.infer(model_name, inputs, outputs=outputs)
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1295: in infer
raise_error_grpc(rpc_error)

rpc_error = <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Request for unknown model...rface/call.cc","file_line":1069,"grpc_message":"Request for unknown model: 'groupby' is not found","grpc_status":14}"


def raise_error_grpc(rpc_error):


  raise get_error_grpc(rpc_error) from None


E       tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Request for unknown model: 'groupby' is not found
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62: InferenceServerException
----------------------------- Captured stdout call -----------------------------
Signal (2) received.
----------------------------- Captured stderr call -----------------------------
I0617 17:42:33.384864 26921 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0617 17:42:33.384979 26921 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0617 17:42:33.384986 26921 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0617 17:42:33.384992 26921 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0617 17:42:33.556122 26921 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fc5a6000000' with size 268435456
I0617 17:42:33.557005 26921 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0617 17:42:33.559403 26921 model_repository_manager.cc:997] loading: groupby:1
I0617 17:42:33.667036 26921 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: groupby (GPU device 0)
I0617 17:42:36.075578 26921 model_repository_manager.cc:1152] successfully loaded 'groupby' version 1
I0617 17:42:36.075753 26921 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0617 17:42:36.075840 26921 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend    | Path                                                            | Config                      |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| python     | /opt/tritonserver/backends/python/libtriton_python.so           | {}                          |
+------------+-----------------------------------------------------------------+-----------------------------+
I0617 17:42:36.075899 26921 server.cc:594]
+---------+---------+--------+
| Model   | Version | Status |
+---------+---------+--------+
| groupby | 1       | READY  |
+---------+---------+--------+
I0617 17:42:36.123404 26921 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0617 17:42:36.125094 26921 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                        |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                       |
| server_version                   | 2.20.0                                                                                                                                                                                       |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0]         | /tmp/pytest-of-jenkins/pytest-10/test_seq_etl_tf_model_tensorfl0                                                                                                                             |
| model_control_mode               | MODE_NONE                                                                                                                                                                                    |
| strict_model_config              | 1                                                                                                                                                                                            |
| rate_limit                       | OFF                                                                                                                                                                                          |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |
| response_cache_byte_size         | 0                                                                                                                                                                                            |
| min_supported_compute_capability | 6.0                                                                                                                                                                                          |
| strict_readiness                 | 1                                                                                                                                                                                            |
| exit_timeout                     | 30                                                                                                                                                                                           |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0617 17:42:36.126239 26921 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001
I0617 17:42:36.126694 26921 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000
I0617 17:42:36.166258 26921 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
I0617 17:42:36.171585 26921 server.cc:252] Waiting for in-flight requests to complete.
I0617 17:42:36.171615 26921 model_repository_manager.cc:1029] unloading: groupby:1
I0617 17:42:36.171701 26921 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
W0617 17:42:37.150860 26921 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
I0617 17:42:37.171799 26921 server.cc:267] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py:373: UserWarning: Loading workflow generated with nvtabular version 1.2.1+4.g793a2f617 - but we are running nvtabular 1.1.1. This might cause issues
warnings.warn(
I0617 17:42:37.745222 26921 model_repository_manager.cc:1135] successfully unloaded 'groupby' version 1
W0617 17:42:38.151061 26921 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
I0617 17:42:38.171919 26921 server.cc:267] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
W0617 17:42:39.176430 26921 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future
warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_triton_inference.py::test_groupby_model[pytorch] - tri...
FAILED tests/unit/test_triton_inference.py::test_seq_etl_tf_model[tensorflow]
===== 2 failed, 1418 passed, 2 skipped, 2343 warnings in 699.97s (0:11:39) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins14316655692237482784.sh

Jun 17 '22 17:06 nvidia-merlin-bot

Documentation preview

https://nvidia-merlin.github.io/NVTabular/review/pr-1589

Jun 17 '22 17:06 github-actions[bot]

Thank you very much @karlhigley, great to hear! 🙂 I also really appreciate the bouncing of ideas back and forth and the discussion working on this together 🙂

Good to know about the ability to add changes that can immediately be committed from the web GUI! Could I also please ask you one more question? On #1580 I squashed my commits when I thought I was done, but I saw this kills the earlier discussion a little bit (some earlier code comments disappear, etc, which I don't think happens if you just keep pushing ). Aaaah and now I see the web GUI gives you an option to squash and commit at the end?

So if I am reading this right, the workflow would be to just keep piling commits on top of each other and then squashing and merging via the web GUI? Am I reading this right?

Thank you very much! 🙂

Jun 17 '22 21:06 radekosmulski

Click to view CI Results

GitHub pull request #1589 of commit c0bddb0b543a7eb280084f7047f20bc34d240090, no merge conflicts.
Running as SYSTEM
Setting status of c0bddb0b543a7eb280084f7047f20bc34d240090 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4535/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10
 > git rev-parse c0bddb0b543a7eb280084f7047f20bc34d240090^{commit} # timeout=10
Checking out Revision c0bddb0b543a7eb280084f7047f20bc34d240090 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c0bddb0b543a7eb280084f7047f20bc34d240090 # timeout=10
Commit message: "Merge branch 'main' into feature/non-strict-mode"
 > git rev-list --no-walk 11a2b68f8ffeb544eb0a335a2d07463a9ab9aa49 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins112138820098634450.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1423 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]
........................................................................ [  8%]
[  8%]
tests/unit/test_notebooks.py ......                                      [  8%]
tests/unit/test_tf4rec.py .                                              [  8%]
tests/unit/test_tools.py ......................                          [ 10%]
tests/unit/test_triton_inference.py ................................     [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
...................................................                      [ 17%]
tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]
tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s..                              [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
......................................................                   [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
...........................................                              [ 40%]
tests/unit/ops/test_column_similarity.py ........................        [ 42%]
tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........                                                                 [ 45%]
tests/unit/ops/test_groupyby.py .................                        [ 47%]
tests/unit/ops/test_hash_bucket.py .........................             [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
..................................                                       [ 59%]
tests/unit/ops/test_lambda.py ..........                                 [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
..                                                                       [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
....................                                                     [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
.......................................                                  [ 88%]
tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]
tests/unit/ops/test_target_encode.py .....................               [ 89%]
tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
..........................................................F              [ 96%]
tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]
tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]
tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
...                                                                      [100%]
=================================== FAILURES ===================================
______________________ test_workflow_strict_mode_disabled ______________________
def test_workflow_strict_mode_disabled():
    df = make_df({"cat": ["a", "a", "b"], "timestamp": [1, 2, 1], "measurement": [0.1, 0.2, 0.5]})
    df["measurement"] = df["measurement"].astype("float32")

    grouped = ["measurement", "cat"] >> ops.Groupby("cat", aggs=["std"])
    workflow = Workflow(grouped)

    dataset = Dataset(df, cpu=True)
    result = workflow.fit_transform(dataset, strict=True)

    # Strict mode should catch the dtype discrepancy
    # between the schema and the output (float32 vs float64)
    with pytest.raises(TypeError):


      result.compute()


E           Failed: DID NOT RAISE <class 'TypeError'>
tests/unit/workflow/test_workflow.py:685: Failed
--------------------------- Captured stderr teardown ---------------------------
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future
warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.
warnings.warn(
tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/workflow/test_workflow.py::test_workflow_strict_mode_disabled
===== 1 failed, 1421 passed, 2 skipped, 2344 warnings in 706.28s (0:11:46) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins3760586536830041619.sh

Jun 17 '22 22:06 nvidia-merlin-bot

@radekosmulski I am also normally inclined to clean up my commits, but since all PRs get squashed before merging, I've mostly stopped. Piling commits on top of each other seems to be the preferred workflow.

Jun 17 '22 22:06 karlhigley

Thank you for your answer @karlhigley! Good to know! 🙂

Jun 18 '22 04:06 radekosmulski

Click to view CI Results

GitHub pull request #1589 of commit e2803d443fd0c0ba4b46c1f6179557189f9241bf, no merge conflicts.
Running as SYSTEM
Setting status of e2803d443fd0c0ba4b46c1f6179557189f9241bf to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4562/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10
 > git rev-parse e2803d443fd0c0ba4b46c1f6179557189f9241bf^{commit} # timeout=10
Checking out Revision e2803d443fd0c0ba4b46c1f6179557189f9241bf (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e2803d443fd0c0ba4b46c1f6179557189f9241bf # timeout=10
Commit message: "Merge branch 'main' into feature/non-strict-mode"
 > git rev-list --no-walk 2f90216a24146c67c9efe73841f57f8a3d9670b0 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins17256470304880615127.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1425 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]
........................................................................ [  8%]
[  8%]
tests/unit/test_notebooks.py ......                                      [  8%]
tests/unit/test_tf4rec.py .                                              [  8%]
tests/unit/test_tools.py ......................                          [ 10%]
tests/unit/test_triton_inference.py ................................     [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
...................................................                      [ 17%]
tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]
tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s..                              [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
......................................................                   [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
...........................................                              [ 40%]
tests/unit/ops/test_column_similarity.py ........................        [ 41%]
tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........                                                                 [ 45%]
tests/unit/ops/test_groupyby.py ...................                      [ 47%]
tests/unit/ops/test_hash_bucket.py .........................             [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
..................................                                       [ 59%]
tests/unit/ops/test_lambda.py ..........                                 [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
..                                                                       [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
....................                                                     [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
.......................................                                  [ 88%]
tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]
tests/unit/ops/test_target_encode.py .....................               [ 89%]
tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
..........................................................F              [ 96%]
tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]
tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]
tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
...                                                                      [100%]
=================================== FAILURES ===================================
______________________ test_workflow_strict_mode_disabled ______________________
def test_workflow_strict_mode_disabled():
    df = make_df({"cat": ["a", "a", "b"], "timestamp": [1, 2, 1], "measurement": [0.1, 0.2, 0.5]})
    df["measurement"] = df["measurement"].astype("float32")

    grouped = ["measurement", "cat"] >> ops.Groupby("cat", aggs=["std"])
    workflow = Workflow(grouped)

    dataset = Dataset(df, cpu=True)
    result = workflow.fit_transform(dataset, strict=True)

    # Strict mode should catch the dtype discrepancy
    # between the schema and the output (float32 vs float64)
    with pytest.raises(TypeError):


      result.compute()


E           Failed: DID NOT RAISE <class 'TypeError'>
tests/unit/workflow/test_workflow.py:685: Failed
--------------------------- Captured stderr teardown ---------------------------
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/workflow/test_workflow.py::test_workflow_strict_mode_disabled
===== 1 failed, 1423 passed, 2 skipped, 697 warnings in 692.93s (0:11:32) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins16050828853899905004.sh

Jun 29 '22 17:06 nvidia-merlin-bot

@karlhigley , please link this to an initiative

Aug 12 '22 22:08 viswa-nvidia

Click to view CI Results

GitHub pull request #1589 of commit a8f2c8d271a492fb014a89228b67ecab1e093088, no merge conflicts.
Running as SYSTEM
!!! PR mergeability status has changed !!!  
PR now has merge conflicts!
Setting status of a8f2c8d271a492fb014a89228b67ecab1e093088 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4635/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10
 > git rev-parse a8f2c8d271a492fb014a89228b67ecab1e093088^{commit} # timeout=10
Checking out Revision a8f2c8d271a492fb014a89228b67ecab1e093088 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a8f2c8d271a492fb014a89228b67ecab1e093088 # timeout=10
Commit message: "Merge branch 'main' into feature/non-strict-mode"
 > git rev-list --no-walk a74290155fced269fd77fc726a919d1645bf8cc6 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6135180235167715312.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1431 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]
........................................................................ [  8%]
....                                                                     [  8%]
tests/unit/test_notebooks.py ......                                      [  8%]
tests/unit/test_tf4rec.py .                                              [  8%]
tests/unit/test_tools.py ......................                          [ 10%]
tests/unit/test_triton_inference.py ................................     [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
...................................................                      [ 18%]
tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]
tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s..                              [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
......................................................                   [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
...........................................                              [ 40%]
tests/unit/ops/test_column_similarity.py ........................        [ 42%]
tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........                                                                 [ 45%]
tests/unit/ops/test_groupyby.py .....................                    [ 47%]
tests/unit/ops/test_hash_bucket.py .........................             [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
..................................                                       [ 59%]
tests/unit/ops/test_lambda.py ..........                                 [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
..                                                                       [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
....................                                                     [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
.......................................                                  [ 88%]
tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]
tests/unit/ops/test_target_encode.py .....................               [ 89%]
tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
..........................................................F              [ 96%]
tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]
tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]
tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
...                                                                      [100%]
=================================== FAILURES ===================================
______________________ test_workflow_strict_mode_disabled ______________________
def test_workflow_strict_mode_disabled():
    df = make_df({"cat": ["a", "a", "b"], "timestamp": [1, 2, 1], "measurement": [0.1, 0.2, 0.5]})
    df["measurement"] = df["measurement"].astype("float32")

    grouped = ["measurement", "cat"] >> ops.Groupby("cat", aggs=["std"])
    workflow = Workflow(grouped)

    dataset = Dataset(df, cpu=True)
    result = workflow.fit_transform(dataset, strict=True)

    # Strict mode should catch the dtype discrepancy
    # between the schema and the output (float32 vs float64)
    with pytest.raises(TypeError):


      result.compute()


E           Failed: DID NOT RAISE <class 'TypeError'>
tests/unit/workflow/test_workflow.py:685: Failed
--------------------------- Captured stderr teardown ---------------------------
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-Shuffle.PER_WORKER-True-device-0-parquet-0.1]
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first
self.make_current()
tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/usr/local/lib/python3.8/dist-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/workflow/test_workflow.py::test_workflow_strict_mode_disabled
===== 1 failed, 1429 passed, 2 skipped, 618 warnings in 693.68s (0:11:33) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins17789346183407376221.sh

Aug 15 '22 19:08 nvidia-merlin-bot

This PR is obsolete after the extraction of executors into Core

Jan 27 '23 20:01 karlhigley

NVTabular NVTabular copied to clipboard

Allow disabling strict checking of Workflow operator outputs vs schema

Documentation preview

NVTabular
NVTabular copied to clipboard