NVTabular
NVTabular copied to clipboard
Allow disabling strict checking of Workflow operator outputs vs schema
This allows disabling strict checking of operator outputs against operator output schemas, which may allow some Workflows
to run in cases where they otherwise wouldn't. This TypeError
was introduced in order to catch cases where operators were fibbing about their own output, but we haven't fully been able to resolve all of those cases, so this provides an escape hatch that may help in some cases. Setting strict=False
here is an acknowledgement that downstream code (e.g. in Merlin Models) may not run as expected in all cases (for example due to dtype discrepancies), but I suspect we may still be okay in many cases, so this gives another option to try.
Related to #1580 (though not an actual fix for the issue reported therein)
I appreciate the help. Thank you.
Click to view CI Results
GitHub pull request #1589 of commit 874ce3c4013a44b7acd02b847db6e1096ab4bbd2, no merge conflicts. Running as SYSTEM Setting status of 874ce3c4013a44b7acd02b847db6e1096ab4bbd2 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4520/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10 > git rev-parse 874ce3c4013a44b7acd02b847db6e1096ab4bbd2^{commit} # timeout=10 Checking out Revision 874ce3c4013a44b7acd02b847db6e1096ab4bbd2 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 874ce3c4013a44b7acd02b847db6e1096ab4bbd2 # timeout=10 Commit message: "Allow disabling strict checking of Workflow operator outputs vs schema" > git rev-list --no-walk cb0d6226fff0e32e2c0ff3772f84f9259410f663 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins11716359310313667117.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1421 items / 1 skippedtests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ............... [ 46%] tests/unit/ops/test_hash_bucket.py ......................... [ 48%] tests/unit/ops/test_join.py ............................................ [ 51%] ........................................................................ [ 56%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 59%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 62%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] ........................................................... [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]
=============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The
nvtabular.loader
module has moved tomerlin.models.loader
. Support for importing fromnvtabular.loader
is deprecated, and will be removed in a future version. Please update your imports to refer tomerlin.models.loader
. warnings.warn(tests/unit/test_dask_nvt.py: 2 warnings tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 6 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 142 warnings tests/unit/loader/test_torch_dataloader.py: 91 warnings tests/unit/ops/test_categorify.py: 70 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 3 warnings tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 34 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(
tests/unit/test_notebooks.py: 18 warnings tests/unit/test_tools.py: 1213 warnings tests/unit/loader/test_tf_dataloader.py: 20 warnings tests/unit/loader/test_torch_dataloader.py: 432 warnings /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/ops/test_ops.py::test_data_stats[True-parquet] tests/unit/ops/test_ops.py::test_data_stats[False-parquet] /usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False] /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using
to_cupy
instead. warnings.warn(tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ========== 1420 passed, 2 skipped, 2343 warnings in 705.42s (0:11:45) ========== Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins15053457041468847894.sh
Fundamentally, this change is awesome 🙂 I really like this way of thinking where we are saying "okay, feel free to use the underlying machinery at your own risk. We provide you all this functionality, but if you want to step off the beaten track, feel free to do so". We are dealing with the fact that NVTabular is built on top of these other libraries very elegantly here.
So it would be great for this change to be merged!
But if the #1580 would get merged at some point (which I also think is important, we are significantly expanding the set of aggregations that we support and cast floats64 to floats32 which makes a lot of sense, it's easy to eat up a lot of memory with these aggregations really quickly), so if #1580 gets merged, the unit test here will stop raising an error and will fail.
ATM I don't see a good way to test the functionality in this PR (unless we find some other operator or aggregation that we currently don't support and don't plan to support in near future, or can play a trick on the operator and amend the schema mid-flight). In fact, same goes for #1580 -- I think the functionality is good there, it is just the additional test that is problematic.
Maybe even with #1580 getting merged this could be tested via doing a list aggregation of floats32 on the CPU? 🤔 This would be a genuine scenario that I am aware of which wouldn't run without strict=False
.
Click to view CI Results
GitHub pull request #1589 of commit 7faf728e4d01f69d4c721fccf35c05329799cd8e, no merge conflicts. Running as SYSTEM Setting status of 7faf728e4d01f69d4c721fccf35c05329799cd8e to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4528/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10 > git rev-parse 7faf728e4d01f69d4c721fccf35c05329799cd8e^{commit} # timeout=10 Checking out Revision 7faf728e4d01f69d4c721fccf35c05329799cd8e (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 7faf728e4d01f69d4c721fccf35c05329799cd8e # timeout=10 Commit message: "Merge branch 'main' into feature/non-strict-mode" > git rev-list --no-walk 645c08ed6887ab7e6d43a1de3407654e5cb16bc3 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins12907291940630881924.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1421 items / 1 skippedtests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ............... [ 46%] tests/unit/ops/test_hash_bucket.py ......................... [ 48%] tests/unit/ops/test_join.py ............................................ [ 51%] ........................................................................ [ 56%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 59%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 62%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] ........................................................... [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]
=============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The
nvtabular.loader
module has moved tomerlin.models.loader
. Support for importing fromnvtabular.loader
is deprecated, and will be removed in a future version. Please update your imports to refer tomerlin.models.loader
. warnings.warn(tests/unit/test_dask_nvt.py: 2 warnings tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 6 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 142 warnings tests/unit/loader/test_torch_dataloader.py: 91 warnings tests/unit/ops/test_categorify.py: 70 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 3 warnings tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 34 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(
tests/unit/test_notebooks.py: 18 warnings tests/unit/test_tools.py: 1213 warnings tests/unit/loader/test_tf_dataloader.py: 20 warnings tests/unit/loader/test_torch_dataloader.py: 432 warnings /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/ops/test_ops.py::test_data_stats[True-parquet] tests/unit/ops/test_ops.py::test_data_stats[False-parquet] /usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False] /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using
to_cupy
instead. warnings.warn(tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ========== 1420 passed, 2 skipped, 2343 warnings in 703.33s (0:11:43) ========== Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins11124367831153039112.sh
@radekosmulski Thanks for the comments, I think you're entirely on target there! Re: PR etiquette etc, I'm really digging your thorough and well-thought out comments. The only thing I'd mention is that for small changes, you can create suggestions from the comment box with the button that has the "piece of paper plus/minus" icon, which lets you directly propose code changes the author can incorporate with a button click. I used it to make the change you proposed by creating a suggestion on my own PR, but we could skip a step. 😺
Click to view CI Results
GitHub pull request #1589 of commit 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5, no merge conflicts. Running as SYSTEM Setting status of 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4530/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10 > git rev-parse 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5^{commit} # timeout=10 Checking out Revision 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5 # timeout=10 Commit message: "Update tests/unit/workflow/test_workflow.py" > git rev-list --no-walk 621f4d729adae51108254981590a6728c077ca95 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins4694174338352369481.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1421 items / 1 skippedtests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ..................................................F [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ............... [ 46%] tests/unit/ops/test_hash_bucket.py ......................... [ 48%] tests/unit/ops/test_join.py ............................................ [ 51%] ........................................................................ [ 56%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 59%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 62%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] ..........................................................F [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]
=================================== FAILURES =================================== ___________________________ test_multihot_empty_rows ___________________________
def test_multihot_empty_rows(): multi_hot = tf.feature_column.categorical_column_with_identity("multihot", 5) multi_hot_embedding = tf.feature_column.embedding_column(multi_hot, 8, combiner="sum") embedding_layer = layers.DenseFeatures([multi_hot_embedding]) inputs = { "multihot": ( tf.keras.Input(name="multihot__values", shape=(1,), dtype=tf.int64), tf.keras.Input(name="multihot__nnzs", shape=(1,), dtype=tf.int64), ) } output = embedding_layer(inputs) model = tf.keras.Model(inputs=inputs, outputs=output) model.compile("sgd", "binary_crossentropy") multi_hot_values = np.array([0, 2, 1, 4, 1, 3, 1]) multi_hot_nnzs = np.array([1, 0, 2, 4, 0]) x = {"multihot": (multi_hot_values[:, None], multi_hot_nnzs[:, None])} multi_hot_embedding_table = embedding_layer.embedding_tables["multihot"].numpy() multi_hot_embedding_rows = _compute_expected_multi_hot( multi_hot_embedding_table, multi_hot_values, multi_hot_nnzs, "sum" ) y_hat = model(x).numpy()
np.testing.assert_allclose(y_hat, multi_hot_embedding_rows, rtol=1e-06)
E AssertionError: E Not equal to tolerance rtol=1e-06, atol=0 E
E Mismatched elements: 1 / 40 (2.5%) E Max absolute difference: 1.4901161e-08 E Max relative difference: 1.2405193e-06 E x: array([[ 0.313274, 0.480283, 0.080427, 0.726971, 0.251228, -0.131764, E 0.102429, 0.482547], E [ 0. , 0. , 0. , 0. , 0. , 0. ,... E y: array([[ 0.313274, 0.480283, 0.080427, 0.726971, 0.251228, -0.131764, E 0.102429, 0.482547], E [ 0. , 0. , 0. , 0. , 0. , 0. ,...tests/unit/framework_utils/test_tf_layers.py:321: AssertionError ______________________ test_workflow_strict_mode_disabled ______________________
def test_workflow_strict_mode_disabled(): df = make_df({"cat": ["a", "a", "b"], "timestamp": [1, 2, 1], "measurement": [0.1, 0.2, 0.5]}) df["measurement"] = df["measurement"].astype("float32") grouped = ["measurement", "cat"] >> ops.Groupby("cat", aggs=["std"]) workflow = Workflow(grouped) dataset = Dataset(df, cpu=True) result = workflow.fit_transform(dataset, strict=True) # Strict mode should catch the dtype discrepancy # between the schema and the output (float32 vs float64) with pytest.raises(TypeError): result.compute() # Disabling strict mode should allow the workflow to run result_ddf = workflow.fit_transform(dataset, strict=False) result = result_ddf.compute()
assert result
tests/unit/workflow/test_workflow.py:691:
self = cat measurement_std 0 a 0.070711 1 b NaN
@final def __nonzero__(self):
raise ValueError(
f"The truth value of a {type(self).__name__} is ambiguous. " "Use a.empty, a.bool(), a.item(), a.any() or a.all()." )
E ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
../../../.local/lib/python3.8/site-packages/pandas/core/generic.py:1537: ValueError ----------------------------- Captured stderr call ----------------------------- /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( Failed to transform operator <nvtabular.ops.groupby.Groupby object at 0x7fef867bb5e0> Traceback (most recent call last): File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 554, in _transform_partition raise TypeError( TypeError: Dtype discrepancy detected for column measurement_std: operator Groupby reported dtype
float32
but returned dtypefloat64
. distributed.worker - WARNING - Compute Failed Function: subgraph_callable-e976f4c4-a80b-49c3-8e38-fee83692 args: ( cat timestamp measurement 0 a 1 0.1 1 a 2 0.2 2 b 1 0.5) kwargs: {} Exception: "TypeError('Dtype discrepancy detected for column measurement_std: operator Groupby reported dtypefloat32
but returned dtypefloat64
.')"--------------------------- Captured stderr teardown --------------------------- /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( =============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The
nvtabular.loader
module has moved tomerlin.models.loader
. Support for importing fromnvtabular.loader
is deprecated, and will be removed in a future version. Please update your imports to refer tomerlin.models.loader
. warnings.warn(tests/unit/test_dask_nvt.py: 2 warnings tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 6 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 142 warnings tests/unit/loader/test_torch_dataloader.py: 91 warnings tests/unit/ops/test_categorify.py: 70 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 3 warnings tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 34 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(
tests/unit/test_notebooks.py: 18 warnings tests/unit/test_tools.py: 1213 warnings tests/unit/loader/test_tf_dataloader.py: 20 warnings tests/unit/loader/test_torch_dataloader.py: 432 warnings /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/ops/test_ops.py::test_data_stats[True-parquet] tests/unit/ops/test_ops.py::test_data_stats[False-parquet] /usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False] /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using
to_cupy
instead. warnings.warn(tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =========================== short test summary info ============================ FAILED tests/unit/framework_utils/test_tf_layers.py::test_multihot_empty_rows FAILED tests/unit/workflow/test_workflow.py::test_workflow_strict_mode_disabled ===== 2 failed, 1418 passed, 2 skipped, 2343 warnings in 707.65s (0:11:47) ===== Build step 'Execute shell' marked build as failure Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins14757039273063764355.sh
Click to view CI Results
GitHub pull request #1589 of commit 793a2f61723591752073ae837d53163393294247, no merge conflicts. Running as SYSTEM Setting status of 793a2f61723591752073ae837d53163393294247 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4531/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10 > git rev-parse 793a2f61723591752073ae837d53163393294247^{commit} # timeout=10 Checking out Revision 793a2f61723591752073ae837d53163393294247 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 793a2f61723591752073ae837d53163393294247 # timeout=10 Commit message: "Make the assertion more explicit" > git rev-list --no-walk 78dacfbdc0c0636ff419c3a0182ba13ec4dc37a5 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins8127438609129841754.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1421 items / 1 skippedtests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ..............................FF [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ............... [ 46%] tests/unit/ops/test_hash_bucket.py ......................... [ 48%] tests/unit/ops/test_join.py ............................................ [ 51%] ........................................................................ [ 56%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 59%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 62%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] ........................................................... [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]
=================================== FAILURES =================================== _________________________ test_groupby_model[pytorch] __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-10/test_groupby_model_pytorch_0') output_model = 'pytorch'
@pytest.mark.skipif(TRITON_SERVER_PATH is None, reason="Requires tritonserver on the path") @pytest.mark.parametrize("output_model", ["tensorflow", "pytorch"]) def test_groupby_model(tmpdir, output_model): size = 20 df = make_df( { "id": np.random.choice([0, 1], size=size), "ts": np.linspace(0.0, 10.0, num=size), "x": np.arange(size), "y": np.linspace(0.0, 10.0, num=size), } ) groupby_features = ColumnSelector(["id", "ts", "x", "y"]) >> ops.Groupby( groupby_cols=["id"], sort_cols=["ts"], aggs={ "x": ["sum"], "y": ["first"], }, name_sep="-", ) workflow = nvt.Workflow(groupby_features)
_verify_workflow_on_tritonserver(
tmpdir, workflow, df, "groupby", output_model, cats=["id", "y-first"], conts=["x-sum"] )
tests/unit/test_triton_inference.py:379:
tests/unit/test_triton_inference.py:112: in _verify_workflow_on_tritonserver response = client.infer(model_name, inputs, outputs=outputs) /usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1295: in infer raise_error_grpc(rpc_error)
rpc_error = <_InactiveRpcError of RPC that terminated with: status = StatusCode.INTERNAL details = "Failed to process the reques...he request(s) for model instance 'groupby', message: Failed to fetch the error in response batch.","grpc_status":13}"
def raise_error_grpc(rpc_error):
raise get_error_grpc(rpc_error) from None
E tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] Failed to process the request(s) for model instance 'groupby', message: Failed to fetch the error in response batch.
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62: InferenceServerException ----------------------------- Captured stdout call ----------------------------- Signal (2) received. ----------------------------- Captured stderr call ----------------------------- I0617 17:42:23.982985 26724 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow I0617 17:42:23.983100 26724 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8 I0617 17:42:23.983107 26724 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8 I0617 17:42:23.983113 26724 tensorflow.cc:2216] backend configuration: {"cmdline":{"version":"2"}} I0617 17:42:24.180156 26724 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fa0a6000000' with size 268435456 I0617 17:42:24.181132 26724 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I0617 17:42:24.184717 26724 model_repository_manager.cc:997] loading: groupby:1 I0617 17:42:24.292229 26724 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: groupby (GPU device 0) I0617 17:42:26.782464 26724 model_repository_manager.cc:1152] successfully loaded 'groupby' version 1 I0617 17:42:26.782635 26724 server.cc:524] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+
I0617 17:42:26.782756 26724 server.cc:551] +------------+-----------------------------------------------------------------+-----------------------------+ | Backend | Path | Config | +------------+-----------------------------------------------------------------+-----------------------------+ | tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} | | python | /opt/tritonserver/backends/python/libtriton_python.so | {} | +------------+-----------------------------------------------------------------+-----------------------------+
I0617 17:42:26.782813 26724 server.cc:594] +---------+---------+--------+ | Model | Version | Status | +---------+---------+--------+ | groupby | 1 | READY | +---------+---------+--------+
I0617 17:42:26.832667 26724 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB I0617 17:42:26.835524 26724 tritonserver.cc:1962] +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.20.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace | | model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-10/test_groupby_model_pytorch_0 | | model_control_mode | MODE_NONE | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | response_cache_byte_size | 0 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0617 17:42:26.836594 26724 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001 I0617 17:42:26.836798 26724 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000 I0617 17:42:26.877737 26724 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002 W0617 17:42:27.854774 26724 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0 W0617 17:42:28.854960 26724 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0 0617 17:42:29.721719 26740 pb_stub.cc:419] Failed to process the request(s) for model 'groupby', message: AttributeError: 'NoneType' object has no attribute 'as_numpy'
At: /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/inference/triton/init.py(76): _convert_tensor /tmp/pytest-of-jenkins/pytest-10/test_groupby_model_pytorch_0/groupby/1/model.py(105):
/tmp/pytest-of-jenkins/pytest-10/test_groupby_model_pytorch_0/groupby/1/model.py(104): execute I0617 17:42:29.722352 26724 server.cc:252] Waiting for in-flight requests to complete. I0617 17:42:29.722387 26724 model_repository_manager.cc:1029] unloading: groupby:1 I0617 17:42:29.722500 26724 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests W0617 17:42:29.873437 26724 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0 I0617 17:42:30.722598 26724 server.cc:267] Timeout 29: Found 1 live models and 0 in-flight non-inference requests I0617 17:42:31.075354 26724 model_repository_manager.cc:1135] successfully unloaded 'groupby' version 1 I0617 17:42:31.722724 26724 server.cc:267] Timeout 28: Found 0 live models and 0 in-flight non-inference requests ______________________ test_seq_etl_tf_model[tensorflow] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-10/test_seq_etl_tf_model_tensorfl0') output_model = 'tensorflow'
@pytest.mark.skipif(TRITON_SERVER_PATH is None, reason="Requires tritonserver on the path") @pytest.mark.parametrize("output_model", ["tensorflow"]) def test_seq_etl_tf_model(tmpdir, output_model): size = 100 max_length = 10 df = make_df( { "id": np.random.choice([0, 1], size=size), "item_id": np.random.randint(1, 10, size), "ts": np.linspace(0.0, 10.0, num=size).astype(np.float32), "y": np.linspace(0.0, 10.0, num=size).astype(np.float32), } ) groupby_features = ColumnSelector(["id", "item_id", "ts", "y"]) >> ops.Groupby( groupby_cols=["id"], sort_cols=["ts"], aggs={ "item_id": ["list"], "y": ["list"], }, name_sep="-", ) feats_list = groupby_features["item_id-list", "y-list"] feats_trim = feats_list >> ops.ListSlice(0, max_length, pad=True) selected_features = groupby_features["id"] + feats_trim workflow = nvt.Workflow(selected_features) sparse_max = {"item_id-list": max_length, "y-list": max_length}
_verify_workflow_on_tritonserver(
tmpdir, workflow, df, "groupby", output_model, sparse_max, cats=["id", "item_id-list"], conts=["y-list"], )
tests/unit/test_triton_inference.py:415:
tests/unit/test_triton_inference.py:112: in _verify_workflow_on_tritonserver response = client.infer(model_name, inputs, outputs=outputs) /usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:1295: in infer raise_error_grpc(rpc_error)
rpc_error = <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "Request for unknown model...rface/call.cc","file_line":1069,"grpc_message":"Request for unknown model: 'groupby' is not found","grpc_status":14}"
def raise_error_grpc(rpc_error):
raise get_error_grpc(rpc_error) from None
E tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] Request for unknown model: 'groupby' is not found
/usr/local/lib/python3.8/dist-packages/tritonclient/grpc/init.py:62: InferenceServerException ----------------------------- Captured stdout call ----------------------------- Signal (2) received. ----------------------------- Captured stderr call ----------------------------- I0617 17:42:33.384864 26921 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow I0617 17:42:33.384979 26921 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8 I0617 17:42:33.384986 26921 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8 I0617 17:42:33.384992 26921 tensorflow.cc:2216] backend configuration: {"cmdline":{"version":"2"}} I0617 17:42:33.556122 26921 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fc5a6000000' with size 268435456 I0617 17:42:33.557005 26921 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I0617 17:42:33.559403 26921 model_repository_manager.cc:997] loading: groupby:1 I0617 17:42:33.667036 26921 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: groupby (GPU device 0) I0617 17:42:36.075578 26921 model_repository_manager.cc:1152] successfully loaded 'groupby' version 1 I0617 17:42:36.075753 26921 server.cc:524] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+
I0617 17:42:36.075840 26921 server.cc:551] +------------+-----------------------------------------------------------------+-----------------------------+ | Backend | Path | Config | +------------+-----------------------------------------------------------------+-----------------------------+ | tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} | | python | /opt/tritonserver/backends/python/libtriton_python.so | {} | +------------+-----------------------------------------------------------------+-----------------------------+
I0617 17:42:36.075899 26921 server.cc:594] +---------+---------+--------+ | Model | Version | Status | +---------+---------+--------+ | groupby | 1 | READY | +---------+---------+--------+
I0617 17:42:36.123404 26921 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB I0617 17:42:36.125094 26921 tritonserver.cc:1962] +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.20.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace | | model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-10/test_seq_etl_tf_model_tensorfl0 | | model_control_mode | MODE_NONE | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | response_cache_byte_size | 0 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0617 17:42:36.126239 26921 grpc_server.cc:4421] Started GRPCInferenceService at 0.0.0.0:8001 I0617 17:42:36.126694 26921 http_server.cc:3113] Started HTTPService at 0.0.0.0:8000 I0617 17:42:36.166258 26921 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002 I0617 17:42:36.171585 26921 server.cc:252] Waiting for in-flight requests to complete. I0617 17:42:36.171615 26921 model_repository_manager.cc:1029] unloading: groupby:1 I0617 17:42:36.171701 26921 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests W0617 17:42:37.150860 26921 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0 I0617 17:42:37.171799 26921 server.cc:267] Timeout 29: Found 1 live models and 0 in-flight non-inference requests /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py:373: UserWarning: Loading workflow generated with nvtabular version 1.2.1+4.g793a2f617 - but we are running nvtabular 1.1.1. This might cause issues warnings.warn( I0617 17:42:37.745222 26921 model_repository_manager.cc:1135] successfully unloaded 'groupby' version 1 W0617 17:42:38.151061 26921 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0 I0617 17:42:38.171919 26921 server.cc:267] Timeout 28: Found 0 live models and 0 in-flight non-inference requests W0617 17:42:39.176430 26921 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0 =============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The
nvtabular.loader
module has moved tomerlin.models.loader
. Support for importing fromnvtabular.loader
is deprecated, and will be removed in a future version. Please update your imports to refer tomerlin.models.loader
. warnings.warn(tests/unit/test_dask_nvt.py: 2 warnings tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 6 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 142 warnings tests/unit/loader/test_torch_dataloader.py: 91 warnings tests/unit/ops/test_categorify.py: 70 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 3 warnings tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 34 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(
tests/unit/test_notebooks.py: 18 warnings tests/unit/test_tools.py: 1213 warnings tests/unit/loader/test_tf_dataloader.py: 20 warnings tests/unit/loader/test_torch_dataloader.py: 432 warnings /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/ops/test_ops.py::test_data_stats[True-parquet] tests/unit/ops/test_ops.py::test_data_stats[False-parquet] /usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False] /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using
to_cupy
instead. warnings.warn(tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =========================== short test summary info ============================ FAILED tests/unit/test_triton_inference.py::test_groupby_model[pytorch] - tri... FAILED tests/unit/test_triton_inference.py::test_seq_etl_tf_model[tensorflow] ===== 2 failed, 1418 passed, 2 skipped, 2343 warnings in 699.97s (0:11:39) ===== Build step 'Execute shell' marked build as failure Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins14316655692237482784.sh
Thank you very much @karlhigley, great to hear! 🙂 I also really appreciate the bouncing of ideas back and forth and the discussion working on this together 🙂
Good to know about the ability to add changes that can immediately be committed from the web GUI! Could I also please ask you one more question? On #1580 I squashed my commits when I thought I was done, but I saw this kills the earlier discussion a little bit (some earlier code comments disappear, etc, which I don't think happens if you just keep pushing ). Aaaah and now I see the web GUI gives you an option to squash and commit at the end?
So if I am reading this right, the workflow would be to just keep piling commits on top of each other and then squashing and merging via the web GUI? Am I reading this right?
Thank you very much! 🙂
Click to view CI Results
GitHub pull request #1589 of commit c0bddb0b543a7eb280084f7047f20bc34d240090, no merge conflicts. Running as SYSTEM Setting status of c0bddb0b543a7eb280084f7047f20bc34d240090 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4535/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10 > git rev-parse c0bddb0b543a7eb280084f7047f20bc34d240090^{commit} # timeout=10 Checking out Revision c0bddb0b543a7eb280084f7047f20bc34d240090 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f c0bddb0b543a7eb280084f7047f20bc34d240090 # timeout=10 Commit message: "Merge branch 'main' into feature/non-strict-mode" > git rev-list --no-walk 11a2b68f8ffeb544eb0a335a2d07463a9ab9aa49 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins112138820098634450.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1423 items / 1 skippedtests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 17%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ................. [ 47%] tests/unit/ops/test_hash_bucket.py ......................... [ 48%] tests/unit/ops/test_join.py ............................................ [ 51%] ........................................................................ [ 56%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 62%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] ..........................................................F [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]
=================================== FAILURES =================================== ______________________ test_workflow_strict_mode_disabled ______________________
def test_workflow_strict_mode_disabled(): df = make_df({"cat": ["a", "a", "b"], "timestamp": [1, 2, 1], "measurement": [0.1, 0.2, 0.5]}) df["measurement"] = df["measurement"].astype("float32") grouped = ["measurement", "cat"] >> ops.Groupby("cat", aggs=["std"]) workflow = Workflow(grouped) dataset = Dataset(df, cpu=True) result = workflow.fit_transform(dataset, strict=True) # Strict mode should catch the dtype discrepancy # between the schema and the output (float32 vs float64) with pytest.raises(TypeError):
result.compute()
E Failed: DID NOT RAISE <class 'TypeError'>
tests/unit/workflow/test_workflow.py:685: Failed --------------------------- Captured stderr teardown --------------------------- /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( =============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The
nvtabular.loader
module has moved tomerlin.models.loader
. Support for importing fromnvtabular.loader
is deprecated, and will be removed in a future version. Please update your imports to refer tomerlin.models.loader
. warnings.warn(tests/unit/test_dask_nvt.py: 2 warnings tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 6 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 142 warnings tests/unit/loader/test_torch_dataloader.py: 91 warnings tests/unit/ops/test_categorify.py: 70 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 3 warnings tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 34 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(
tests/unit/test_notebooks.py: 18 warnings tests/unit/test_tools.py: 1213 warnings tests/unit/loader/test_tf_dataloader.py: 20 warnings tests/unit/loader/test_torch_dataloader.py: 432 warnings /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/ops/test_ops.py::test_data_stats[True-parquet] tests/unit/ops/test_ops.py::test_data_stats[False-parquet] /usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False] /usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future. warnings.warn(
tests/unit/ops/test_ops.py::test_difference_lag[False] /usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using
to_cupy
instead. warnings.warn(tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =========================== short test summary info ============================ FAILED tests/unit/workflow/test_workflow.py::test_workflow_strict_mode_disabled ===== 1 failed, 1421 passed, 2 skipped, 2344 warnings in 706.28s (0:11:46) ===== Build step 'Execute shell' marked build as failure Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins3760586536830041619.sh
@radekosmulski I am also normally inclined to clean up my commits, but since all PRs get squashed before merging, I've mostly stopped. Piling commits on top of each other seems to be the preferred workflow.
Thank you for your answer @karlhigley! Good to know! 🙂
Click to view CI Results
GitHub pull request #1589 of commit e2803d443fd0c0ba4b46c1f6179557189f9241bf, no merge conflicts. Running as SYSTEM Setting status of e2803d443fd0c0ba4b46c1f6179557189f9241bf to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4562/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10 > git rev-parse e2803d443fd0c0ba4b46c1f6179557189f9241bf^{commit} # timeout=10 Checking out Revision e2803d443fd0c0ba4b46c1f6179557189f9241bf (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f e2803d443fd0c0ba4b46c1f6179557189f9241bf # timeout=10 Commit message: "Merge branch 'main' into feature/non-strict-mode" > git rev-list --no-walk 2f90216a24146c67c9efe73841f57f8a3d9670b0 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins17256470304880615127.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1425 items / 1 skippedtests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 17%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 41%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ................... [ 47%] tests/unit/ops/test_hash_bucket.py ......................... [ 48%] tests/unit/ops/test_join.py ............................................ [ 51%] ........................................................................ [ 56%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 62%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] ..........................................................F [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]
=================================== FAILURES =================================== ______________________ test_workflow_strict_mode_disabled ______________________
def test_workflow_strict_mode_disabled(): df = make_df({"cat": ["a", "a", "b"], "timestamp": [1, 2, 1], "measurement": [0.1, 0.2, 0.5]}) df["measurement"] = df["measurement"].astype("float32") grouped = ["measurement", "cat"] >> ops.Groupby("cat", aggs=["std"]) workflow = Workflow(grouped) dataset = Dataset(df, cpu=True) result = workflow.fit_transform(dataset, strict=True) # Strict mode should catch the dtype discrepancy # between the schema and the output (float32 vs float64) with pytest.raises(TypeError):
result.compute()
E Failed: DID NOT RAISE <class 'TypeError'>
tests/unit/workflow/test_workflow.py:685: Failed --------------------------- Captured stderr teardown --------------------------- /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( =============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The
nvtabular.loader
module has moved tomerlin.models.loader
. Support for importing fromnvtabular.loader
is deprecated, and will be removed in a future version. Please update your imports to refer tomerlin.models.loader
. warnings.warn(tests/unit/test_dask_nvt.py: 2 warnings tests/unit/workflow/test_workflow.py: 78 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results. warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 5 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 66 warnings tests/unit/loader/test_torch_dataloader.py: 67 warnings tests/unit/ops/test_categorify.py: 69 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 1 warning tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 27 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(
tests/unit/test_notebooks.py: 1 warning tests/unit/test_tools.py: 17 warnings tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 54 warnings /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =========================== short test summary info ============================ FAILED tests/unit/workflow/test_workflow.py::test_workflow_strict_mode_disabled ===== 1 failed, 1423 passed, 2 skipped, 697 warnings in 692.93s (0:11:32) ====== Build step 'Execute shell' marked build as failure Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins16050828853899905004.sh
@karlhigley , please link this to an initiative
Click to view CI Results
GitHub pull request #1589 of commit a8f2c8d271a492fb014a89228b67ecab1e093088, no merge conflicts. Running as SYSTEM !!! PR mergeability status has changed !!! PR now has merge conflicts! Setting status of a8f2c8d271a492fb014a89228b67ecab1e093088 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4635/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1589/*:refs/remotes/origin/pr/1589/* # timeout=10 > git rev-parse a8f2c8d271a492fb014a89228b67ecab1e093088^{commit} # timeout=10 Checking out Revision a8f2c8d271a492fb014a89228b67ecab1e093088 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f a8f2c8d271a492fb014a89228b67ecab1e093088 # timeout=10 Commit message: "Merge branch 'main' into feature/non-strict-mode" > git rev-list --no-walk a74290155fced269fd77fc726a919d1645bf8cc6 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins6135180235167715312.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1431 items / 1 skippedtests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] .... [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ..................... [ 47%] tests/unit/ops/test_hash_bucket.py ......................... [ 49%] tests/unit/ops/test_join.py ............................................ [ 52%] ........................................................................ [ 57%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 63%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] ..........................................................F [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]
=================================== FAILURES =================================== ______________________ test_workflow_strict_mode_disabled ______________________
def test_workflow_strict_mode_disabled(): df = make_df({"cat": ["a", "a", "b"], "timestamp": [1, 2, 1], "measurement": [0.1, 0.2, 0.5]}) df["measurement"] = df["measurement"].astype("float32") grouped = ["measurement", "cat"] >> ops.Groupby("cat", aggs=["std"]) workflow = Workflow(grouped) dataset = Dataset(df, cpu=True) result = workflow.fit_transform(dataset, strict=True) # Strict mode should catch the dtype discrepancy # between the schema and the output (float32 vs float64) with pytest.raises(TypeError):
result.compute()
E Failed: DID NOT RAISE <class 'TypeError'>
tests/unit/workflow/test_workflow.py:685: Failed --------------------------- Captured stderr teardown --------------------------- /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( =============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)
nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The
nvtabular.loader
module has moved tomerlin.models.loader
. Support for importing fromnvtabular.loader
is deprecated, and will be removed in a future version. Please update your imports to refer tomerlin.models.loader
. warnings.warn(tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-Shuffle.PER_WORKER-True-device-0-parquet-0.1] /usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first self.make_current()
tests/unit/test_dask_nvt.py: 1 warning tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 5 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 66 warnings tests/unit/loader/test_torch_dataloader.py: 67 warnings tests/unit/ops/test_categorify.py: 69 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 1 warning tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 27 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /usr/local/lib/python3.8/dist-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(
tests/unit/test_notebooks.py: 1 warning tests/unit/test_tools.py: 17 warnings tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 54 warnings /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =========================== short test summary info ============================ FAILED tests/unit/workflow/test_workflow.py::test_workflow_strict_mode_disabled ===== 1 failed, 1429 passed, 2 skipped, 618 warnings in 693.68s (0:11:33) ====== Build step 'Execute shell' marked build as failure Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins17789346183407376221.sh
This PR is obsolete after the extraction of executors into Core