NVTabular icon indicating copy to clipboard operation
NVTabular copied to clipboard

Graphs were correctly visualized when ran the script again.

Open kuwarkapur opened this issue 3 years ago • 7 comments
trafficstars

graph of categorical features and the combination of categorical features i.e.('userId', 'movieId') and numerical feature i.e. (rating) were visualized and the difference can be seen in the uploaded script.

kuwarkapur avatar May 15 '22 09:05 kuwarkapur

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Click to view CI Results
GitHub pull request #1547 of commit cb228501b8f1f079e943cc2bade29eb94acdee1c, no merge conflicts.
Running as SYSTEM
Setting status of cb228501b8f1f079e943cc2bade29eb94acdee1c to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/4469/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
 > git rev-parse cb228501b8f1f079e943cc2bade29eb94acdee1c^{commit} # timeout=10
Checking out Revision cb228501b8f1f079e943cc2bade29eb94acdee1c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f cb228501b8f1f079e943cc2bade29eb94acdee1c # timeout=10
Commit message: "Add files via upload"
 > git rev-list --no-walk 8b43ecde40769ce5105a733c799b9a055b994093 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6299493311054922507.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.0.0)
Collecting setuptools
  Downloading setuptools-62.2.0-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 12.4 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 61.0.0
    Uninstalling setuptools-61.0.0:
      Successfully uninstalled setuptools-61.0.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-auth 1.35.0 requires cachetools=2.0.0, but you have cachetools 5.0.0 which is incompatible.
tensorflow-gpu 2.8.0 requires keras=2.8.0rc0, but you have keras 2.6.0 which is incompatible.
tensorflow-gpu 2.8.0 requires tensorboard=2.8, but you have tensorboard 2.6.0 which is incompatible.
Successfully installed setuptools-62.2.0
WARNING: You are using pip version 22.0.4; however, version 22.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (22.0.4)
Collecting pip
  Downloading pip-22.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 13.7 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.2.0)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.1)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.9.2)
Requirement already satisfied: numpy==1.20.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (1.20.3)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 22.0.4
    Uninstalling pip-22.0.4:
      Successfully uninstalled pip-22.0.4
  WARNING: The scripts pip, pip3, pip3.10 and pip3.8 are installed in '/var/jenkins_home/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.6.2 requires spacy=2021.11.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: betterproto=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.20.1)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.64.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.0.0)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.3.5)
Requirement already satisfied: dask>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.55.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (21.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.4.2)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.11.2)
Requirement already satisfied: pyyaml in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.4.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (62.2.0)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (8.0.4)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.8.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.3)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.4.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.3)
Requirement already satisfied: llvmlite=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.38.0)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.8)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2022.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.8.2)
Requirement already satisfied: absl-py=0.9 in /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.12.0)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.56.0)
Requirement already satisfied: six in /var/jenkins_home/.local/lib/python3.8/site-packages (from absl-py=0.9->tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.15.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.2.1)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.1)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.1)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.0.0)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.3.0+1.g3c62869-py3-none-any.whl size=133336 sha256=4292ff6c26a37f59536cfe056293c3c28fdc2d02a953cd0337eff344b571d916
  Stored in directory: /tmp/pip-ephem-wheel-cache-1av9ttz8/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nvtabular 1.0.0+10.g4df99eb4 requires merlin-core==0.2.0, but you have merlin-core 0.3.0+1.g3c62869 which is incompatible.
Successfully installed merlin-core-0.3.0+1.g3c62869
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: natsort==8.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (8.1.0)
Requirement already satisfied: myst-nb=7.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-nb=5.6 in /usr/local/lib/python3.8/dist-packages (from myst-nb=0.15 in /usr/local/lib/python3.8/dist-packages (from myst-nb=3.1 in /usr/local/lib/python3.8/dist-packages (from myst-nb=7.1 in /usr/local/lib/python3.8/dist-packages (from sphinx-external-toc=4.5.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=4.3.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=7.0.0->myst-nb=7.0.0->myst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from ipywidgets=7.0.0->myst-nbmyst-nb=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb4.3 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=2.4.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nbmyst-nb=18.5 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nb=2.0.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nbmyst-nb=0.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nb=1.3.12 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nbmyst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-parser~=0.15.2->myst-nb=5.6->myst-nb=4.7 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=2.0 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=0.8.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=1.4.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=0.2.2 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat~=5.0->myst-nbmyst-nb=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=2.5.0 in /usr/lib/python3/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=1.3 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=0.7 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=1.1 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nbmyst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->myst-nb=2015.7 in /usr/local/lib/python3.8/dist-packages (from babel>=1.3->sphinx=3.1->myst-nb=6.1.12 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=1.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=6.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython->myst-nb=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=1.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython->myst-nb=2.0.0->ipython->myst-nb=1.3.12->jupyter-cache~=0.4.1->myst-nb=4.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from widgetsnbextension~=3.6.0->ipywidgets=7.0.0->myst-nb1.2 in /var/jenkins_home/.local/lib/python3.8/site-packages/soupsieve-2.2.1-py3.8.egg (from beautifulsoup4->nbconvert=5.6->myst-nbnbconvert=5.6->myst-nb=1.9.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from bleach->nbconvert=5.6->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nb=0.2.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nb=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->nbconvert=5.6->myst-nbipython->myst-nbipython->myst-nbipython->myst-nb=4.0.1 in /usr/local/lib/python3.8/dist-packages (from GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nb=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=22.3 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=3.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=0.8.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=2.8 in /usr/lib/python3/dist-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=3.0.1 in /usr/local/lib/python3.8/dist-packages (from gitdb=4.0.1->GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1 in /usr/local/lib/python3.8/dist-packages (from argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1->argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb build/lib.linux-x86_64-cpython-38/tests
copying tests/__init__.py -> build/lib.linux-x86_64-cpython-38/tests
creating build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/io.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/_version.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/graph.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/dispatch.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/worker.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular
creating build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_triton_inference.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_dask_nvt.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tf4rec.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_s3.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tools.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/__init__.py -> build/lib.linux-x86_64-cpython-38/tests/unit
creating build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/torch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/backend.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tf_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference
copying nvtabular/inference/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
copying nvtabular/framework_utils/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
creating build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/inspector_script.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/dataset_inspector.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/data_gen.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
creating build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/data_stats.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/stat_operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/clip.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/target_encoding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/add_metadata.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/logop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hashed_cross.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/categorify.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/rename.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/list_slice.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hash_bucket.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/fill.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/dropna.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/lambdaop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/value_counts.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/normalize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/filter.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_external.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/moments.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/difference_lag.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/bucketize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/column_similarity.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
creating build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/node.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/workflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/benchmarking_tools.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/model_config_pb2.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/workflow_model.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/ensemble.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/data_conversions.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/base.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/hugectr.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/pytorch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/model_pt.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/feature_column_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/models.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/outer_product.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/embedding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/interaction.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/embeddings.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
/usr/local/lib/python3.8/dist-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
package init file 'ci/__init__.py' not found (or not a regular file)
package init file 'images/__init__.py' not found (or not a regular file)
package init file 'docs/__init__.py' not found (or not a regular file)
package init file 'cpp/__init__.py' not found (or not a regular file)
package init file 'bench/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench
copying bench/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/bench
package init file 'merlin/__init__.py' not found (or not a regular file)
package init file 'examples/__init__.py' not found (or not a regular file)
package init file 'conda/__init__.py' not found (or not a regular file)
package init file 'docs/source/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/docs
creating build/lib.linux-x86_64-cpython-38/docs/source
copying docs/source/conf.py -> build/lib.linux-x86_64-cpython-38/docs/source
package init file 'docs/source/_templates/__init__.py' not found (or not a regular file)
package init file 'docs/source/images/__init__.py' not found (or not a regular file)
package init file 'docs/source/training/__init__.py' not found (or not a regular file)
package init file 'docs/source/resources/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/inference/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets
copying bench/datasets/test_dataset.py -> build/lib.linux-x86_64-cpython-38/bench/datasets
package init file 'bench/torch/__init__.py' not found (or not a regular file)
package init file 'bench/examples/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dask-nvtabular-criteo-benchmark.py -> build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dataloader_bench.py -> build/lib.linux-x86_64-cpython-38/bench/examples
package init file 'bench/datasets/configs/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/tools/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_hugectr.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_pytorch.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/nvt_etl.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_tensorflow.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
package init file 'bench/torch/criteo/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/merlin
creating build/lib.linux-x86_64-cpython-38/merlin/transforms
copying merlin/transforms/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms
creating build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
copying merlin/transforms/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
package init file 'examples/tensorflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples
creating build/lib.linux-x86_64-cpython-38/examples/tensorflow
copying examples/tensorflow/callbacks.py -> build/lib.linux-x86_64-cpython-38/examples/tensorflow
package init file 'examples/getting-started-movielens/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-toy-example/__init__.py' not found (or not a regular file)
package init file 'examples/tabular-data-rossmann/__init__.py' not found (or not a regular file)
package init file 'examples/advanced-ops-outbrain/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-movielens/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/tf_trainer.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/torch_trainer_dist.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
package init file 'examples/scaling-criteo/__init__.py' not found (or not a regular file)
package init file 'examples/winning-solution-recsys2020-twitter/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/docker/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/getting-started-movielens/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/scaling-criteo/imgs/__init__.py' not found (or not a regular file)
package init file 'tests/integration/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_tf_inference.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_inf_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_hugectr.py -> build/lib.linux-x86_64-cpython-38/tests/integration
package init file 'tests/integration/common/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common
copying tests/integration/common/utils.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common
package init file 'tests/integration/common/parsers/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/benchmark_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/rossmann_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/criteo_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
package init file 'tests/unit/loader/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_dataloader_backend.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_tf_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_torch_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
package init file 'tests/unit/framework_utils/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_feature_columns.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_torch_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
package init file 'tests/unit/ops/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_fill.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_lambda.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_categorify.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops_schema.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_target_encode.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_groupyby.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_column_similarity.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_join.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_normalize.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_hash_bucket.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
package init file 'tests/unit/workflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_cpu_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_schemas.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_node.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_chaining.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
package init file 'conda/environments/__init__.py' not found (or not a regular file)
package init file 'conda/recipes/__init__.py' not found (or not a regular file)
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build/temp.linux-x86_64-cpython-38
creating build/temp.linux-x86_64-cpython-38/cpp
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+4.gcb228501 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+4.gcb228501 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+4.gcb228501 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+4.gcb228501 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -L/usr/lib -o build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 1.1.1+4.gcb228501 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular Running black --check All done! ✨ 🍰 ✨ 131 files would be left unchanged. Running flake8 Running isort Skipped 2 files Running bandit Running pylint


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb Building docs make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs' /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " INFO:sphinxcontrib.copydirs.copydirs:Copying source documentation from: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples INFO:sphinxcontrib.copydirs.copydirs: ...to destination: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/source/examples INFO:traitlets:Writing 14816 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain INFO:traitlets:Writing 35171 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 19347 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/03-Training-with-TF.ipynb INFO:traitlets:Writing 14170 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Writing 34457 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 28932 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb INFO:traitlets:Writing 20504 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-PyTorch.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Writing 61676 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-TF.ipynb INFO:traitlets:Writing 18521 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb INFO:traitlets:Writing 21842 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb INFO:traitlets:Writing 43655 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb INFO:traitlets:Writing 44549 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb INFO:traitlets:Writing 9604 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/01-Download-Convert.ipynb INFO:traitlets:Writing 21552 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 12041 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-FastAI.ipynb INFO:traitlets:Writing 20792 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo INFO:traitlets:Writing 203961 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-TF.ipynb INFO:traitlets:Writing 32956 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb INFO:traitlets:Writing 25153 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb INFO:traitlets:Writing 23938 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Writing 33764 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 19635 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-FastAI.ipynb INFO:traitlets:Writing 17586 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-PyTorch.ipynb INFO:traitlets:Writing 21354 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-TF.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter INFO:traitlets:Writing 77074 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter/01-02-04-Download-Convert-ETL-with-NVTabular-Training-with-XGBoost.ipynb make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs' ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1420 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ............... [ 46%] tests/unit/ops/test_hash_bucket.py ......................... [ 48%] tests/unit/ops/test_join.py ............................................ [ 51%] ........................................................................ [ 56%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 62%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] .......................................................... [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]

=============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92 ../../../../../usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92 /usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if pa_version and LooseVersion(pa_version) < LooseVersion("2.0"):

../../../../../usr/lib/python3.8/site-packages/dask_cudf/core.py:32 /usr/lib/python3.8/site-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)

../../../../../usr/local/lib/python3.8/dist-packages/setuptools/_distutils/version.py:351: 34 warnings /usr/local/lib/python3.8/dist-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)

nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader. warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 6 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 142 warnings tests/unit/loader/test_torch_dataloader.py: 91 warnings tests/unit/ops/test_categorify.py: 70 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 3 warnings tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 34 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/lib/python3.8/site-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings /core/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /core/merlin/core/utils.py:433: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet] tests/unit/ops/test_ops.py::test_data_stats[True-parquet] tests/unit/ops/test_ops.py::test_data_stats[False-parquet] /usr/lib/python3.8/site-packages/cudf/core/series.py:923: FutureWarning: Series.set_index is deprecated and will be removed in the future warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /core/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(

tests/unit/ops/test_categorify.py::test_categorify_max_size[6] tests/unit/ops/test_categorify.py::test_categorify_max_size[max_emb_size1] tests/unit/ops/test_categorify.py::test_categorify_max_size_null_iloc_check /usr/lib/python3.8/site-packages/cudf/core/frame.py:3077: FutureWarning: keep_index is deprecated and will be removed in the future. warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops.py::test_difference_lag[False] /usr/lib/python3.8/site-packages/cudf/core/dataframe.py:3041: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead. warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /core/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings /core/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /core/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 22 0 0 0 100% nvtabular/dispatch.py 3 3 0 0 0% 18-23 nvtabular/framework_utils/init.py 2 0 0 0 100% nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100% nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 78 90 15 39% 30, 99, 103, 114-130, 140, 143-158, 162, 166-167, 173-198, 207-217, 220-227, 229->233, 234, 239-279, 282 nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100% nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 89 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367 nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 22 1 45% 49, 74-103, 106-110, 113 nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 12 0 19% 37-38, 41-60, 71-84, 87 nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111 nvtabular/framework_utils/torch/init.py 0 0 0 0 100% nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100% nvtabular/framework_utils/torch/layers/embeddings.py 32 2 18 2 92% 50, 91 nvtabular/framework_utils/torch/models.py 45 1 30 4 93% 57->61, 87->89, 93->96, 103 nvtabular/framework_utils/torch/utils.py 75 5 34 5 91% 51->53, 64, 71->76, 75, 118-120 nvtabular/graph.py 3 3 0 0 0% 18-23 nvtabular/inference/init.py 2 0 0 0 100% nvtabular/inference/triton/init.py 36 12 14 1 58% 42-49, 68, 72, 76-82 nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103 nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84 nvtabular/inference/triton/ensemble.py 266 148 82 7 46% 90-94, 157-196, 240-288, 305-309, 381-389, 418-434, 486-496, 548-588, 594-610, 614-681, 711, 733, 739-758, 764-788, 795 nvtabular/inference/triton/model/init.py 0 0 0 0 100% nvtabular/inference/triton/model/model_pt.py 101 101 42 0 0% 27-220 nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100% nvtabular/inference/triton/workflow_model.py 55 55 24 0 0% 27-128 nvtabular/inference/workflow/init.py 0 0 0 0 100% nvtabular/inference/workflow/base.py 113 113 62 0 0% 27-209 nvtabular/inference/workflow/hugectr.py 37 37 16 0 0% 27-87 nvtabular/inference/workflow/pytorch.py 10 10 6 0 0% 27-46 nvtabular/inference/workflow/tensorflow.py 32 32 10 0 0% 26-68 nvtabular/io.py 3 3 0 0 0% 18-23 nvtabular/loader/init.py 2 0 0 0 100% nvtabular/loader/backend.py 371 17 154 12 94% 27-28, 142, 158-159, 299->301, 311-315, 362-363, 402->406, 403->402, 478, 482-483, 512, 588-589, 624, 632 nvtabular/loader/tensorflow.py 179 25 60 9 86% 38-39, 74, 85-89, 101, 115, 124, 329, 357, 368, 383-385, 414-416, 426-434, 437-440 nvtabular/loader/tf_utils.py 57 10 22 6 80% 31->34, 34->36, 41->43, 45, 46->67, 52-53, 61-63, 69-73 nvtabular/loader/torch.py 87 14 26 3 80% 28-30, 33-39, 114, 158-159, 164 nvtabular/ops/init.py 26 0 0 0 100% nvtabular/ops/add_metadata.py 34 0 14 0 100% nvtabular/ops/bucketize.py 40 9 20 3 73% 52-54, 58->exit, 61-64, 83-86 nvtabular/ops/categorify.py 660 70 348 48 86% 251, 253, 271, 275, 283, 291, 293, 320, 341-342, 391->395, 399-406, 452, 460, 483-484, 561-566, 637, 733, 750, 795, 873-874, 889-893, 894->858, 912, 920, 927->exit, 951, 954->957, 1006->1004, 1068, 1073, 1107->1111, 1113->1053, 1119-1122, 1134, 1138, 1142, 1149, 1154-1157, 1235, 1237, 1307->1330, 1313->1330, 1331-1336, 1381, 1394->1397, 1401->1406, 1405, 1411, 1414, 1422-1432 nvtabular/ops/clip.py 18 2 8 3 81% 44, 52->54, 55 nvtabular/ops/column_similarity.py 121 26 38 5 74% 19-20, 29-30, 81->exit, 111, 206-207, 216-218, 226-242, 259->262, 263, 273 nvtabular/ops/data_stats.py 56 1 24 3 95% 107->109, 111, 113->103 nvtabular/ops/difference_lag.py 43 0 14 1 98% 73->75 nvtabular/ops/drop_low_cardinality.py 18 0 10 1 96% 85->84 nvtabular/ops/dropna.py 9 0 2 0 100% nvtabular/ops/fill.py 76 5 30 1 92% 63-67, 109 nvtabular/ops/filter.py 20 1 8 1 93% 49 nvtabular/ops/groupby.py 135 4 88 5 96% 74, 86, 96->98, 141, 233 nvtabular/ops/hash_bucket.py 43 1 22 2 95% 79, 118->124 nvtabular/ops/hashed_cross.py 38 3 17 3 89% 52, 64, 92 nvtabular/ops/join_external.py 96 8 34 7 88% 20-21, 114, 116, 118, 150->152, 205-206, 216->227, 221 nvtabular/ops/join_groupby.py 128 5 57 6 94% 113, 120, 129, 136->135, 178->175, 181->175, 260-261 nvtabular/ops/lambdaop.py 62 6 22 6 86% 60, 64, 82, 95, 100, 109 nvtabular/ops/list_slice.py 89 29 42 0 64% 21-22, 146-160, 168-190 nvtabular/ops/logop.py 21 0 6 0 100% nvtabular/ops/moments.py 69 0 24 0 100% nvtabular/ops/normalize.py 93 4 22 1 94% 89, 139-140, 167 nvtabular/ops/operator.py 11 1 2 0 92% 52 nvtabular/ops/reduce_dtype_size.py 49 0 20 2 97% 68->77, 74->77 nvtabular/ops/rename.py 29 3 14 3 86% 46, 71-73 nvtabular/ops/stat_operator.py 8 0 2 0 100% nvtabular/ops/target_encoding.py 182 9 76 5 93% 168->172, 176->185, 274, 283-284, 297-303, 396->399 nvtabular/ops/value_counts.py 34 0 6 1 98% 40->38 nvtabular/tools/init.py 0 0 0 0 100% nvtabular/tools/data_gen.py 271 25 90 7 91% 26-27, 31-32, 129-132, 142-146, 148, 170-171, 322, 332, 358, 361-370 nvtabular/tools/dataset_inspector.py 52 8 24 2 79% 33-40, 51 nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168 nvtabular/utils.py 3 0 0 0 100% nvtabular/worker.py 3 3 0 0 0% 18-23 nvtabular/workflow/init.py 2 0 0 0 100% nvtabular/workflow/node.py 7 0 4 0 100% nvtabular/workflow/workflow.py 219 17 94 12 91% 28-29, 52, 85, 206, 212->226, 239-241, 373, 388-389, 431, 508, 538, 548-550, 563

TOTAL 5211 1129 2095 203 76% Coverage XML written to file coverage.xml

Required test coverage of 70% reached. Total coverage: 76.50% =========================== short test summary info ============================ SKIPPED [1] ../../../../../usr/lib/python3.8/site-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': cannot import name 'ParamSpec' from 'typing_extensions' (/var/jenkins_home/.local/lib/python3.8/site-packages/typing_extensions.py) SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: not working correctly in ci environment ========== 1419 passed, 2 skipped, 665 warnings in 1168.52s (0:19:28) ========== Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins5905419844544771701.sh

nvidia-merlin-bot avatar May 15 '22 09:05 nvidia-merlin-bot

@benfred can you review this pull request ?

kuwarkapur avatar May 15 '22 10:05 kuwarkapur

Click to view CI Results
GitHub pull request #1547 of commit 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1, no merge conflicts.
Running as SYSTEM
Setting status of 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/4473/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
 > git rev-parse 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1^{commit} # timeout=10
Checking out Revision 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1 # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk 4d416c743185014d21ac3f337084bb49482e5daf # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1922903398759176915.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.0.0)
Collecting setuptools
  Downloading setuptools-62.3.1-py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 39.2 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 61.0.0
    Uninstalling setuptools-61.0.0:
      Successfully uninstalled setuptools-61.0.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-auth 1.35.0 requires cachetools=2.0.0, but you have cachetools 5.0.0 which is incompatible.
tensorflow-gpu 2.8.0 requires keras=2.8.0rc0, but you have keras 2.6.0 which is incompatible.
tensorflow-gpu 2.8.0 requires tensorboard=2.8, but you have tensorboard 2.6.0 which is incompatible.
Successfully installed setuptools-62.3.1
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (22.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.3.1)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.1)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.9.2)
Requirement already satisfied: numpy==1.20.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (1.20.3)
Found existing installation: nvtabular 1.0.0+10.g4df99eb4
Uninstalling nvtabular-1.0.0+10.g4df99eb4:
  Successfully uninstalled nvtabular-1.0.0+10.g4df99eb4
Found existing installation: merlin-core 0+untagged.78.gc43c798
Uninstalling merlin-core-0+untagged.78.gc43c798:
  Successfully uninstalled merlin-core-0+untagged.78.gc43c798
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-install-76qa40o7/merlin-core_c19c73fda7b34f28a5358bbbbb9ea8a3
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-install-76qa40o7/merlin-core_c19c73fda7b34f28a5358bbbbb9ea8a3
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 98dd76b5646cb9f0cf1ff089f20f5afeaba37217
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.0.0)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.64.0)
Requirement already satisfied: distributed>=2021.11.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: dask>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (21.3)
Requirement already satisfied: betterproto=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.20.1)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.55.1)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.3.5)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.4.2)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.0)
Requirement already satisfied: pyyaml in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.4.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.11.2)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (62.3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.3)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.8.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.3)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (8.0.4)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.4.0)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.1)
Requirement already satisfied: llvmlite=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.38.0)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.8)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2022.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.8.2)
Requirement already satisfied: absl-py=0.9 in /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.12.0)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.56.0)
Requirement already satisfied: six in /var/jenkins_home/.local/lib/python3.8/site-packages (from absl-py=0.9->tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.15.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.2.1)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.2)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.1)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.0.0)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.3.0+2.g98dd76b-py3-none-any.whl size=133339 sha256=c8daca3ddfc9cc58fdfedf350e077b35f5fa6602b9734ef708d9974570e61b33
  Stored in directory: /tmp/pip-ephem-wheel-cache-qtnt5mlm/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nvtabular 1.0.0+10.g4df99eb4 requires merlin-core==0.2.0, but you have merlin-core 0.3.0+2.g98dd76b which is incompatible.
Successfully installed merlin-core-0.3.0+2.g98dd76b
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: natsort==8.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (8.1.0)
Requirement already satisfied: myst-nb=5.6 in /usr/local/lib/python3.8/dist-packages (from myst-nb=7.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-nb=3.1 in /usr/local/lib/python3.8/dist-packages (from myst-nb=0.15 in /usr/local/lib/python3.8/dist-packages (from myst-nb=7.1 in /usr/local/lib/python3.8/dist-packages (from sphinx-external-toc=7.0.0->myst-nb=4.5.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=7.0.0->myst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from ipywidgets=7.0.0->myst-nb=4.3.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nbmyst-nb=2.0.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=2.4.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nbmyst-nb=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nb=18.5 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb4.3 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nbmyst-nb=1.3.12 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nb=0.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nbmyst-nbmyst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-parser~=0.15.2->myst-nb=5.6->myst-nb=4.7 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=2.0 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=1.4.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=0.8.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=0.2.2 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nbmyst-nb=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat~=5.0->myst-nb=3.1->myst-nb=3.1->myst-nb=1.1 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=2.5.0 in /usr/lib/python3/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=0.7 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=1.3 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nbmyst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->myst-nb=2015.7 in /usr/local/lib/python3.8/dist-packages (from babel>=1.3->sphinx=3.1->myst-nb=6.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=1.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=6.1.12 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython->myst-nb=1.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython->myst-nb=2.0.0->ipython->myst-nb=1.3.12->jupyter-cache~=0.4.1->myst-nb=4.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from widgetsnbextension~=3.6.0->ipywidgets=7.0.0->myst-nb1.2 in /var/jenkins_home/.local/lib/python3.8/site-packages/soupsieve-2.2.1-py3.8.egg (from beautifulsoup4->nbconvert=5.6->myst-nbnbconvert=5.6->myst-nb=1.9.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from bleach->nbconvert=5.6->myst-nb=0.2.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nb=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->nbconvert=5.6->myst-nbipython->myst-nbipython->myst-nbipython->myst-nb=4.0.1 in /usr/local/lib/python3.8/dist-packages (from GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nb=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=22.3 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=3.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=0.8.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=2.8 in /usr/lib/python3/dist-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=3.0.1 in /usr/local/lib/python3.8/dist-packages (from gitdb=4.0.1->GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1 in /usr/local/lib/python3.8/dist-packages (from argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1->argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb build/lib.linux-x86_64-cpython-38/tests
copying tests/__init__.py -> build/lib.linux-x86_64-cpython-38/tests
creating build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/io.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/_version.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/graph.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/dispatch.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/worker.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular
creating build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_triton_inference.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_dask_nvt.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tf4rec.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_s3.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tools.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/__init__.py -> build/lib.linux-x86_64-cpython-38/tests/unit
creating build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/torch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/backend.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tf_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference
copying nvtabular/inference/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
copying nvtabular/framework_utils/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
creating build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/inspector_script.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/dataset_inspector.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/data_gen.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
creating build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/data_stats.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/stat_operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/clip.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/target_encoding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/add_metadata.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/logop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hashed_cross.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/categorify.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/rename.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/list_slice.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hash_bucket.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/fill.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/dropna.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/lambdaop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/value_counts.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/normalize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/filter.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_external.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/moments.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/difference_lag.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/bucketize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/column_similarity.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
creating build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/node.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/workflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/benchmarking_tools.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/model_config_pb2.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/workflow_model.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/ensemble.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/data_conversions.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/base.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/hugectr.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/pytorch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/model_pt.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/feature_column_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/models.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/outer_product.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/embedding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/interaction.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/embeddings.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
/usr/local/lib/python3.8/dist-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
package init file 'ci/__init__.py' not found (or not a regular file)
package init file 'images/__init__.py' not found (or not a regular file)
package init file 'docs/__init__.py' not found (or not a regular file)
package init file 'cpp/__init__.py' not found (or not a regular file)
package init file 'bench/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench
copying bench/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/bench
package init file 'merlin/__init__.py' not found (or not a regular file)
package init file 'examples/__init__.py' not found (or not a regular file)
package init file 'conda/__init__.py' not found (or not a regular file)
package init file 'docs/source/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/docs
creating build/lib.linux-x86_64-cpython-38/docs/source
copying docs/source/conf.py -> build/lib.linux-x86_64-cpython-38/docs/source
package init file 'docs/source/_templates/__init__.py' not found (or not a regular file)
package init file 'docs/source/images/__init__.py' not found (or not a regular file)
package init file 'docs/source/training/__init__.py' not found (or not a regular file)
package init file 'docs/source/resources/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/inference/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets
copying bench/datasets/test_dataset.py -> build/lib.linux-x86_64-cpython-38/bench/datasets
package init file 'bench/torch/__init__.py' not found (or not a regular file)
package init file 'bench/examples/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dask-nvtabular-criteo-benchmark.py -> build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dataloader_bench.py -> build/lib.linux-x86_64-cpython-38/bench/examples
package init file 'bench/datasets/configs/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/tools/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_hugectr.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_pytorch.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/nvt_etl.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_tensorflow.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
package init file 'bench/torch/criteo/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/merlin
creating build/lib.linux-x86_64-cpython-38/merlin/transforms
copying merlin/transforms/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms
creating build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
copying merlin/transforms/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
package init file 'examples/tensorflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples
creating build/lib.linux-x86_64-cpython-38/examples/tensorflow
copying examples/tensorflow/callbacks.py -> build/lib.linux-x86_64-cpython-38/examples/tensorflow
package init file 'examples/getting-started-movielens/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-toy-example/__init__.py' not found (or not a regular file)
package init file 'examples/tabular-data-rossmann/__init__.py' not found (or not a regular file)
package init file 'examples/advanced-ops-outbrain/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-movielens/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/tf_trainer.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/torch_trainer_dist.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
package init file 'examples/scaling-criteo/__init__.py' not found (or not a regular file)
package init file 'examples/winning-solution-recsys2020-twitter/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/docker/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/getting-started-movielens/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/scaling-criteo/imgs/__init__.py' not found (or not a regular file)
package init file 'tests/integration/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_tf_inference.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_inf_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_hugectr.py -> build/lib.linux-x86_64-cpython-38/tests/integration
package init file 'tests/integration/common/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common
copying tests/integration/common/utils.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common
package init file 'tests/integration/common/parsers/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/benchmark_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/rossmann_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/criteo_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
package init file 'tests/unit/loader/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_dataloader_backend.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_tf_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_torch_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
package init file 'tests/unit/framework_utils/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_feature_columns.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_torch_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
package init file 'tests/unit/ops/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_fill.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_lambda.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_categorify.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops_schema.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_target_encode.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_groupyby.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_column_similarity.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_join.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_normalize.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_hash_bucket.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
package init file 'tests/unit/workflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_cpu_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_schemas.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_node.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_chaining.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
package init file 'conda/environments/__init__.py' not found (or not a regular file)
package init file 'conda/recipes/__init__.py' not found (or not a regular file)
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build/temp.linux-x86_64-cpython-38
creating build/temp.linux-x86_64-cpython-38/cpp
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+7.g4a31dd03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+7.g4a31dd03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+7.g4a31dd03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+7.g4a31dd03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -L/usr/lib -o build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 1.1.1+7.g4a31dd03 is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular Running black --check All done! ✨ 🍰 ✨ 131 files would be left unchanged. Running flake8 Running isort Skipped 2 files Running bandit Running pylint


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb Building docs make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs' /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " INFO:sphinxcontrib.copydirs.copydirs:Copying source documentation from: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples INFO:sphinxcontrib.copydirs.copydirs: ...to destination: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/source/examples INFO:traitlets:Writing 14816 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain INFO:traitlets:Writing 35171 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 19347 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/03-Training-with-TF.ipynb INFO:traitlets:Writing 14170 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Writing 34457 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 28932 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb INFO:traitlets:Writing 20504 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-PyTorch.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Writing 61676 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-TF.ipynb INFO:traitlets:Writing 18521 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb INFO:traitlets:Writing 21842 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb INFO:traitlets:Writing 43655 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb INFO:traitlets:Writing 44549 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb INFO:traitlets:Writing 9604 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/01-Download-Convert.ipynb INFO:traitlets:Writing 21552 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 12041 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-FastAI.ipynb INFO:traitlets:Writing 20792 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo INFO:traitlets:Writing 203961 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-TF.ipynb INFO:traitlets:Writing 32956 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb INFO:traitlets:Writing 25153 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb INFO:traitlets:Writing 23938 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Writing 33764 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 19635 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-FastAI.ipynb INFO:traitlets:Writing 17586 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-PyTorch.ipynb INFO:traitlets:Writing 21354 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-TF.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter INFO:traitlets:Writing 77074 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter/01-02-04-Download-Convert-ETL-with-NVTabular-Training-with-XGBoost.ipynb make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs' ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1420 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py .................F

=================================== FAILURES =================================== ___________________________ test_full_df[None-1000] ____________________________

num_rows = 1000 tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_None_1000_0') distro = None

@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
    json_sample["num_rows"] = num_rows
    cats = list(json_sample["cats"].keys())
    cols = datagen._get_cols_from_schema(json_sample, distros=distro)

    df_gen = datagen.DatasetGen(datagen.UniformDistro(), gpu_frac=0.00001)
    df_files = df_gen.full_df_create(num_rows, cols, entries=True, output=tmpdir)
    test_size = 0
    full_df = make_df()
    for fi in df_files:
        df = Dataset(fi).to_ddf().compute()
        test_size = test_size + df.shape[0]
        full_df = concat([full_df, df])
    assert test_size == num_rows
    conts_rep = cols["conts"]
    cats_rep = cols["cats"]
    labels_rep = cols["labels"]
    assert df.shape[1] == len(conts_rep) + len(cats_rep) + len(labels_rep)
    for idx, cat in enumerate(cats[1:]):
        dist = cats_rep[idx + 1].distro or df_gen.dist
        if HAS_GPU:
            if not is_string_dtype(full_df[cat]._column):
                sts, ps = dist.verify(full_df[cat].to_pandas())
                assert all(s > 0.9 for s in sts)
        else:
            if not is_string_dtype(full_df[cat]):
                sts, ps = dist.verify(full_df[cat])
                assert all(s > 0.9 for s in sts)
        # these are not mh series
        assert full_df[cat].nunique() == cats_rep[0].cardinality
      assert full_df[cat].str.len().min() == cats_rep[0].min_entry_size

E assert 2 == 1 E + where 2 = <bound method Frame.min of 0 5\n1 5\n2 4\n3 5\n4 3\n ..\n995 5\n996 2\n997 5\n998 2\n999 4\nName: cat_5, Length: 1000, dtype: int32>() E + where <bound method Frame.min of 0 5\n1 5\n2 4\n3 5\n4 3\n ..\n995 5\n996 2\n997 5\n998 2\n999 4\nName: cat_5, Length: 1000, dtype: int32> = 0 5\n1 5\n2 4\n3 5\n4 3\n ..\n995 5\n996 2\n997 5\n998 2\n999 4\nName: cat_5, Length: 1000, dtype: int32.min E + where 0 5\n1 5\n2 4\n3 5\n4 3\n ..\n995 5\n996 2\n997 5\n998 2\n999 4\nName: cat_5, Length: 1000, dtype: int32 = <bound method StringMethods.len of <cudf.core.column.string.StringMethods object at 0x7f3af5924220>>() E + where <bound method StringMethods.len of <cudf.core.column.string.StringMethods object at 0x7f3af5924220>> = <cudf.core.column.string.StringMethods object at 0x7f3af5924220>.len E + where <cudf.core.column.string.StringMethods object at 0x7f3af5924220> = 0 GXbtF\n1 GXbtF\n2 ECTv\n3 jOME3\n4 c9e\n ... \n995 dCO14\n996 bo\n997 GXbtF\n998 bo\n999 c6Ud\nName: cat_5, Length: 1000, dtype: object.str E + and 1 = <nvtabular.tools.data_gen.CatCol object at 0x7f3af08e95b0>.min_entry_size

tests/unit/test_tools.py:161: AssertionError =============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92 ../../../../../usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92 /usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if pa_version and LooseVersion(pa_version) < LooseVersion("2.0"):

../../../../../usr/lib/python3.8/site-packages/dask_cudf/core.py:32 /usr/lib/python3.8/site-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)

../../../../../usr/local/lib/python3.8/dist-packages/setuptools/_distutils/version.py:351: 34 warnings /usr/local/lib/python3.8/dist-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)

nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader. warnings.warn(

tests/unit/test_dask_nvt.py::test_cats_and_groupby_stats[False-0.01] tests/unit/test_dask_nvt.py::test_cats_and_groupby_stats[False-0.01] tests/unit/test_tf4rec.py::test_tf4rec tests/unit/test_tools.py::test_full_df[None-1000] /usr/lib/python3.8/site-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings /core/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /core/merlin/core/utils.py:433: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Branch BrPart Cover Missing

nvtabular/init.py 22 0 0 0 100% nvtabular/dispatch.py 3 3 0 0 0% 18-23 nvtabular/framework_utils/init.py 2 0 0 0 100% nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100% nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 125 90 0 4% 28-32, 69-286 nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100% nvtabular/framework_utils/tensorflow/layers/embedding.py 153 86 89 10 40% 31-32, 39, 51-60, 68-69, 73-75, 79-93, 119-124, 177, 179, 193, 205, 217-218, 227, 231-239, 249-265, 307-311, 314-344, 347-360, 363-364, 367 nvtabular/framework_utils/tensorflow/layers/interaction.py 47 39 22 0 14% 48-52, 55-71, 74-103, 106-110, 113 nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 12 0 19% 37-38, 41-60, 71-84, 87 nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111 nvtabular/framework_utils/torch/init.py 0 0 0 0 100% nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100% nvtabular/framework_utils/torch/layers/embeddings.py 32 3 18 3 88% 39, 50, 91 nvtabular/framework_utils/torch/models.py 45 1 30 4 93% 57->61, 87->89, 93->96, 103 nvtabular/framework_utils/torch/utils.py 75 9 34 7 85% 51->53, 64, 71->76, 75, 109, 118-120, 129-131 nvtabular/graph.py 3 3 0 0 0% 18-23 nvtabular/inference/init.py 2 0 0 0 100% nvtabular/inference/triton/init.py 36 12 14 1 58% 42-49, 68, 72, 76-82 nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103 nvtabular/inference/triton/data_conversions.py 87 73 58 0 10% 32-33, 52-84, 88-94, 98-105, 109-115, 119-136, 140-150 nvtabular/inference/triton/ensemble.py 266 155 82 10 42% 90-94, 157-196, 240-288, 305-309, 381-389, 415, 418-434, 438-442, 455-456, 486-496, 548-588, 594-610, 614-681, 711, 733, 739-758, 764-788, 795 nvtabular/inference/triton/model/init.py 0 0 0 0 100% nvtabular/inference/triton/model/model_pt.py 101 101 42 0 0% 27-220 nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100% nvtabular/inference/triton/workflow_model.py 55 55 24 0 0% 27-128 nvtabular/inference/workflow/init.py 0 0 0 0 100% nvtabular/inference/workflow/base.py 113 113 62 0 0% 27-209 nvtabular/inference/workflow/hugectr.py 37 37 16 0 0% 27-87 nvtabular/inference/workflow/pytorch.py 10 10 6 0 0% 27-46 nvtabular/inference/workflow/tensorflow.py 32 32 10 0 0% 26-68 nvtabular/io.py 3 3 0 0 0% 18-23 nvtabular/loader/init.py 2 0 0 0 100% nvtabular/loader/backend.py 371 52 154 27 83% 27-28, 92, 97-98, 125, 137-142, 145->exit, 157-159, 178, 179->181, 235, 271-275, 278-281, 286, 293, 299->301, 311-315, 325-326, 357, 362-363, 399-400, 402->406, 403->402, 431, 449, 478, 482-483, 508-517, 578->581, 588-589, 618, 623-627, 632 nvtabular/loader/tensorflow.py 179 55 60 15 67% 38-39, 63-65, 74, 85-89, 101, 110-115, 124, 311-313, 329, 351, 355-358, 363, 368, 375, 383-385, 389, 395->399, 408-418, 422, 426-434, 437-440, 443-447, 453 nvtabular/loader/tf_utils.py 57 10 22 6 80% 31->34, 34->36, 41->43, 45, 46->67, 52-53, 61-63, 69-73 nvtabular/loader/torch.py 87 39 26 3 50% 28-30, 33-39, 114, 119, 124-130, 154-166, 169, 174-179, 182-187, 190-191 nvtabular/ops/init.py 26 0 0 0 100% nvtabular/ops/add_metadata.py 34 3 14 0 94% 34, 38, 42 nvtabular/ops/bucketize.py 40 20 20 2 40% 52-54, 58->exit, 59-64, 71-88, 92, 96 nvtabular/ops/categorify.py 660 167 348 77 70% 251, 253, 271, 275, 279, 283, 287, 291, 293, 297, 305, 320, 323-328, 341-342, 372-376, 384-408, 434, 448->451, 452, 457, 460, 483-484, 491-499, 561-566, 598, 625->627, 628, 629->631, 635, 637, 646, 725, 727->730, 733, 750, 759-764, 795, 829, 873-874, 889-893, 894->858, 912, 920, 927-928, 945-946, 951, 954->957, 983, 1003-1021, 1037, 1059, 1063, 1065-1068, 1073, 1077-1089, 1091->1094, 1099->1053, 1107-1114, 1115->1117, 1119-1122, 1134, 1138, 1142, 1149, 1154-1157, 1235, 1237, 1299, 1307->1330, 1313->1330, 1331-1336, 1354, 1358-1366, 1369, 1380-1388, 1394->1397, 1401->1406, 1405, 1411, 1414, 1419-1433, 1454-1462 nvtabular/ops/clip.py 18 2 8 3 81% 44, 52->54, 55 nvtabular/ops/column_similarity.py 121 86 38 0 23% 19-20, 29-30, 72-78, 81-88, 92-114, 125-126, 129-134, 138, 142, 168-197, 206-207, 216-218, 226-242, 251-276, 280-283, 287-288 nvtabular/ops/data_stats.py 56 40 24 0 22% 44-48, 51, 55-93, 96-115, 118 nvtabular/ops/difference_lag.py 43 21 14 1 44% 60->63, 70-81, 87, 90-95, 99, 103, 106 nvtabular/ops/drop_low_cardinality.py 18 11 10 0 32% 30-31, 50, 80-90 nvtabular/ops/dropna.py 9 3 2 0 73% 36-38 nvtabular/ops/fill.py 76 35 30 3 47% 53-55, 63-67, 73, 79, 102-104, 108-115, 120-121, 125-128, 135, 138-142, 145-148 nvtabular/ops/filter.py 20 3 8 3 79% 49, 56, 60 nvtabular/ops/groupby.py 135 15 88 12 83% 74, 86, 96->98, 98->87, 107->112, 132, 141, 150->149, 233, 251, 268-271, 284, 290-297 nvtabular/ops/hash_bucket.py 43 22 22 2 38% 75, 79, 88-101, 106-110, 113-124, 128, 132 nvtabular/ops/hashed_cross.py 38 22 17 1 35% 52, 58-68, 73-78, 82, 87-92 nvtabular/ops/join_external.py 96 19 34 11 72% 20-21, 114, 116, 118, 131, 138, 142-145, 150-151, 156-157, 205-206, 220-227 nvtabular/ops/join_groupby.py 128 20 57 9 79% 111, 113, 120, 126-129, 136-139, 144-146, 177-180, 181->175, 230-231, 260-261 nvtabular/ops/lambdaop.py 62 6 22 6 86% 60, 64, 82, 95, 100, 109 nvtabular/ops/list_slice.py 89 39 42 5 47% 21-22, 67-68, 74, 86-94, 105, 121->127, 146-160, 168-190 nvtabular/ops/logop.py 21 2 6 1 89% 48-51 nvtabular/ops/moments.py 69 1 24 1 98% 71 nvtabular/ops/normalize.py 93 27 22 3 67% 72, 77, 82, 89, 126-128, 134-142, 148, 155-159, 162-163, 167, 176, 180 nvtabular/ops/operator.py 11 1 2 0 92% 52 nvtabular/ops/reduce_dtype_size.py 49 28 20 0 33% 36-39, 43, 46, 49-50, 54-56, 59-80 nvtabular/ops/rename.py 29 3 14 3 86% 46, 71-73 nvtabular/ops/stat_operator.py 8 0 2 0 100% nvtabular/ops/target_encoding.py 182 127 76 0 22% 166-207, 210-213, 217, 226-227, 230-243, 246-251, 254-257, 261, 265, 269-274, 277-280, 283-284, 287-288, 291-292, 296-377, 381-413, 423-432 nvtabular/ops/value_counts.py 34 20 6 0 40% 37-53, 56, 59, 62-64, 67 nvtabular/tools/init.py 0 0 0 0 100% nvtabular/tools/data_gen.py 271 25 90 8 90% 26-27, 31-32, 129-132, 142-146, 148, 170-171, 322, 332, 356->355, 358, 361-370 nvtabular/tools/dataset_inspector.py 52 40 24 0 21% 31-40, 50-51, 71-112 nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168 nvtabular/utils.py 3 0 0 0 100% nvtabular/worker.py 3 3 0 0 0% 18-23 nvtabular/workflow/init.py 2 0 0 0 100% nvtabular/workflow/node.py 7 0 4 0 100% nvtabular/workflow/workflow.py 219 24 94 14 88% 28-29, 52, 85, 121-122, 126, 187->190, 206, 212->226, 239-241, 257, 373, 388-389, 431, 508, 538-546, 548-550, 563

TOTAL 5211 2031 2095 251 57% Coverage XML written to file coverage.xml

FAIL Required test coverage of 70% not reached. Total coverage: 56.79% =========================== short test summary info ============================ SKIPPED [1] ../../../../../usr/lib/python3.8/site-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': cannot import name 'ParamSpec' from 'typing_extensions' (/var/jenkins_home/.local/lib/python3.8/site-packages/typing_extensions.py) !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! ====== 1 failed, 140 passed, 1 skipped, 55 warnings in 311.42s (0:05:11) ======= Build step 'Execute shell' marked build as failure Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins1418994287624193960.sh

nvidia-merlin-bot avatar May 17 '22 13:05 nvidia-merlin-bot

Click to view CI Results
GitHub pull request #1547 of commit faf9a2aba510487f7d46069165371cdaacaebf91, no merge conflicts.
Running as SYSTEM
Setting status of faf9a2aba510487f7d46069165371cdaacaebf91 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/4485/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
 > git rev-parse faf9a2aba510487f7d46069165371cdaacaebf91^{commit} # timeout=10
Checking out Revision faf9a2aba510487f7d46069165371cdaacaebf91 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f faf9a2aba510487f7d46069165371cdaacaebf91 # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk db9adbb37ec2389de0be270b879300f5315655dc # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins423197427368600513.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.0.0)
Collecting setuptools
  Downloading setuptools-62.3.2-py3-none-any.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 21.3 MB/s eta 0:00:00
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 61.0.0
    Uninstalling setuptools-61.0.0:
      Successfully uninstalled setuptools-61.0.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-auth 1.35.0 requires cachetools=2.0.0, but you have cachetools 5.0.0 which is incompatible.
tensorflow-gpu 2.8.0 requires keras=2.8.0rc0, but you have keras 2.6.0 which is incompatible.
tensorflow-gpu 2.8.0 requires tensorboard=2.8, but you have tensorboard 2.6.0 which is incompatible.
Successfully installed setuptools-62.3.2
WARNING: There was an error checking the latest version of pip.
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (22.1)
Collecting pip
  Downloading pip-22.1.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 55.2 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.3.2)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.1)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.9.2)
Requirement already satisfied: numpy==1.20.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (1.20.3)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 22.1
    Uninstalling pip-22.1:
      Successfully uninstalled pip-22.1
  WARNING: The scripts pip, pip3, pip3.10 and pip3.8 are installed in '/var/jenkins_home/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.6.2 requires spacy=2021.11.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.3.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.0.0)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.64.0)
Requirement already satisfied: dask>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: betterproto=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.20.1)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.55.1)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.4.2)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.11.2)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: pyyaml in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.4.1)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (8.0.4)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.4.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.3)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (62.3.2)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.3)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.8.0)
Requirement already satisfied: llvmlite=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.38.0)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.8)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2022.1)
Requirement already satisfied: absl-py=0.9 in /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.12.0)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.56.0)
Requirement already satisfied: six in /var/jenkins_home/.local/lib/python3.8/site-packages (from absl-py=0.9->tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.15.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.2.1)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.1)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.1)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.0.0)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.3.0+5.g6df9aaa-py3-none-any.whl size=133336 sha256=efd0931c0f9eec0ec32fa714f41914293a5bd5a6ec839dd75ac2d08a80d1838a
  Stored in directory: /tmp/pip-ephem-wheel-cache-re2o0hdv/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nvtabular 1.0.0+10.g4df99eb4 requires merlin-core==0.2.0, but you have merlin-core 0.3.0+5.g6df9aaa which is incompatible.
Successfully installed merlin-core-0.3.0+5.g6df9aaa
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: natsort==8.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (8.1.0)
Requirement already satisfied: myst-nb=7.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-nb=0.15 in /usr/local/lib/python3.8/dist-packages (from myst-nb=3.1 in /usr/local/lib/python3.8/dist-packages (from myst-nb=5.6 in /usr/local/lib/python3.8/dist-packages (from myst-nb=7.1 in /usr/local/lib/python3.8/dist-packages (from sphinx-external-toc=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from ipywidgets=7.0.0->myst-nb=7.0.0->myst-nb=4.5.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=4.3.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=7.0.0->myst-nbmyst-nbmyst-nbmyst-nb4.3 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=2.0.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=2.4.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nb=18.5 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nb=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=0.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nb=1.3.12 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nbmyst-nbmyst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-parser~=0.15.2->myst-nb=2.0 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=0.2.2 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=4.7 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=0.8.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=1.4.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nbmyst-nb=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat~=5.0->myst-nb=3.1->myst-nb=1.1 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=2.5.0 in /usr/lib/python3/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=1.3 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=0.7 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=3.1->myst-nbmyst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->myst-nb=2015.7 in /usr/local/lib/python3.8/dist-packages (from babel>=1.3->sphinx=3.1->myst-nb=6.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=6.1.12 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=1.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython->myst-nb=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=1.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython->myst-nb=2.0.0->ipython->myst-nb=1.3.12->jupyter-cache~=0.4.1->myst-nb=4.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from widgetsnbextension~=3.6.0->ipywidgets=7.0.0->myst-nb1.2 in /var/jenkins_home/.local/lib/python3.8/site-packages/soupsieve-2.2.1-py3.8.egg (from beautifulsoup4->nbconvert=5.6->myst-nb=1.9.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from bleach->nbconvert=5.6->myst-nbnbconvert=5.6->myst-nb=0.2.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nb=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->nbconvert=5.6->myst-nbipython->myst-nbipython->myst-nbipython->myst-nb=4.0.1 in /usr/local/lib/python3.8/dist-packages (from GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nb=22.3 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=3.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=0.8.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=1.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=2.8 in /usr/lib/python3/dist-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=3.0.1 in /usr/local/lib/python3.8/dist-packages (from gitdb=4.0.1->GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1 in /usr/local/lib/python3.8/dist-packages (from argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1->argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb build/lib.linux-x86_64-cpython-38/tests
copying tests/__init__.py -> build/lib.linux-x86_64-cpython-38/tests
creating build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/io.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/_version.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/graph.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/dispatch.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/worker.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular
creating build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_triton_inference.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_dask_nvt.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tf4rec.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_s3.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tools.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/__init__.py -> build/lib.linux-x86_64-cpython-38/tests/unit
creating build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/torch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/backend.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tf_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference
copying nvtabular/inference/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
copying nvtabular/framework_utils/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
creating build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/inspector_script.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/dataset_inspector.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/data_gen.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
creating build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/data_stats.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/stat_operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/clip.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/target_encoding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/add_metadata.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/logop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hashed_cross.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/categorify.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/rename.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/list_slice.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hash_bucket.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/fill.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/dropna.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/lambdaop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/value_counts.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/normalize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/filter.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_external.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/moments.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/difference_lag.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/bucketize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/column_similarity.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
creating build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/node.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/workflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/benchmarking_tools.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/model_config_pb2.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/workflow_model.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/ensemble.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/data_conversions.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/base.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/hugectr.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/pytorch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/model_pt.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/feature_column_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/models.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/outer_product.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/embedding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/interaction.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/embeddings.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
/usr/local/lib/python3.8/dist-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
package init file 'ci/__init__.py' not found (or not a regular file)
package init file 'images/__init__.py' not found (or not a regular file)
package init file 'docs/__init__.py' not found (or not a regular file)
package init file 'cpp/__init__.py' not found (or not a regular file)
package init file 'bench/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench
copying bench/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/bench
package init file 'merlin/__init__.py' not found (or not a regular file)
package init file 'examples/__init__.py' not found (or not a regular file)
package init file 'conda/__init__.py' not found (or not a regular file)
package init file 'docs/source/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/docs
creating build/lib.linux-x86_64-cpython-38/docs/source
copying docs/source/conf.py -> build/lib.linux-x86_64-cpython-38/docs/source
package init file 'docs/source/_templates/__init__.py' not found (or not a regular file)
package init file 'docs/source/images/__init__.py' not found (or not a regular file)
package init file 'docs/source/training/__init__.py' not found (or not a regular file)
package init file 'docs/source/resources/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/inference/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets
copying bench/datasets/test_dataset.py -> build/lib.linux-x86_64-cpython-38/bench/datasets
package init file 'bench/torch/__init__.py' not found (or not a regular file)
package init file 'bench/examples/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dask-nvtabular-criteo-benchmark.py -> build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dataloader_bench.py -> build/lib.linux-x86_64-cpython-38/bench/examples
package init file 'bench/datasets/configs/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/tools/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_hugectr.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_pytorch.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/nvt_etl.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_tensorflow.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
package init file 'bench/torch/criteo/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/merlin
creating build/lib.linux-x86_64-cpython-38/merlin/transforms
copying merlin/transforms/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms
creating build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
copying merlin/transforms/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
package init file 'examples/tensorflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples
creating build/lib.linux-x86_64-cpython-38/examples/tensorflow
copying examples/tensorflow/callbacks.py -> build/lib.linux-x86_64-cpython-38/examples/tensorflow
package init file 'examples/getting-started-movielens/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-toy-example/__init__.py' not found (or not a regular file)
package init file 'examples/tabular-data-rossmann/__init__.py' not found (or not a regular file)
package init file 'examples/advanced-ops-outbrain/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-movielens/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/tf_trainer.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/torch_trainer_dist.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
package init file 'examples/scaling-criteo/__init__.py' not found (or not a regular file)
package init file 'examples/winning-solution-recsys2020-twitter/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/docker/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/getting-started-movielens/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/scaling-criteo/imgs/__init__.py' not found (or not a regular file)
package init file 'tests/integration/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_tf_inference.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_inf_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_hugectr.py -> build/lib.linux-x86_64-cpython-38/tests/integration
package init file 'tests/integration/common/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common
copying tests/integration/common/utils.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common
package init file 'tests/integration/common/parsers/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/benchmark_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/rossmann_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/criteo_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
package init file 'tests/unit/loader/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_dataloader_backend.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_tf_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_torch_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
package init file 'tests/unit/framework_utils/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_feature_columns.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_torch_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
package init file 'tests/unit/ops/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_fill.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_lambda.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_categorify.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops_schema.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_target_encode.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_groupyby.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_column_similarity.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_join.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_normalize.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_hash_bucket.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
package init file 'tests/unit/workflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_cpu_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_schemas.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_node.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_chaining.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
package init file 'conda/environments/__init__.py' not found (or not a regular file)
package init file 'conda/recipes/__init__.py' not found (or not a regular file)
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build/temp.linux-x86_64-cpython-38
creating build/temp.linux-x86_64-cpython-38/cpp
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+11.gfaf9a2aba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+11.gfaf9a2aba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+11.gfaf9a2aba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+11.gfaf9a2aba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -L/usr/lib -o build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so -> 
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 1.1.1+11.gfaf9a2aba is already the active version in easy-install.pth

Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular Running black --check All done! ✨ 🍰 ✨ 131 files would be left unchanged. Running flake8 Running isort Skipped 2 files Running bandit Running pylint


Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Running flake8-nb Building docs make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs' /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " INFO:sphinxcontrib.copydirs.copydirs:Copying source documentation from: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples INFO:sphinxcontrib.copydirs.copydirs: ...to destination: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/source/examples INFO:traitlets:Writing 14816 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain INFO:traitlets:Writing 35171 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 19347 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/03-Training-with-TF.ipynb INFO:traitlets:Writing 14170 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Writing 34457 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 28932 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb INFO:traitlets:Writing 20504 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-PyTorch.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens INFO:traitlets:Writing 61676 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-TF.ipynb INFO:traitlets:Writing 18521 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb INFO:traitlets:Writing 21842 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb INFO:traitlets:Writing 43655 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb INFO:traitlets:Writing 44549 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb INFO:traitlets:Writing 9604 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/01-Download-Convert.ipynb INFO:traitlets:Writing 21552 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 12041 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-FastAI.ipynb INFO:traitlets:Writing 20792 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo INFO:traitlets:Writing 203961 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-TF.ipynb INFO:traitlets:Writing 32956 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb INFO:traitlets:Writing 25153 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb INFO:traitlets:Writing 23938 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/01-Download-Convert.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann INFO:traitlets:Writing 33764 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/02-ETL-with-NVTabular.ipynb INFO:traitlets:Writing 19635 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-FastAI.ipynb INFO:traitlets:Writing 17586 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-PyTorch.ipynb INFO:traitlets:Writing 21354 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-TF.ipynb INFO:traitlets:Support files will be in INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter INFO:traitlets:Writing 77074 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter/01-02-04-Download-Convert-ETL-with-NVTabular-Training-with-XGBoost.ipynb make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs' ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1420 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ............... [ 46%] tests/unit/ops/test_hash_bucket.py ......................... [ 48%] tests/unit/ops/test_join.py ............................................ [ 51%] ........................................................................ [ 56%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 62%] .. [ 62%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ........Build timed out (after 60 minutes). Marking the build as failed. .Terminated Build was aborted Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins7990170128639053624.sh

nvidia-merlin-bot avatar May 22 '22 19:05 nvidia-merlin-bot

Click to view CI Results
GitHub pull request #1547 of commit c2a5b743c7a0b458be7af4ca96da091887a044b9, no merge conflicts.
Running as SYSTEM
Setting status of c2a5b743c7a0b458be7af4ca96da091887a044b9 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4614/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
 > git rev-parse c2a5b743c7a0b458be7af4ca96da091887a044b9^{commit} # timeout=10
Checking out Revision c2a5b743c7a0b458be7af4ca96da091887a044b9 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c2a5b743c7a0b458be7af4ca96da091887a044b9 # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk 242fc3657c847d7ed026dc657dc5a331c73ca015 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins14470040310932446478.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1432 items

tests/unit/test_dask_nvt.py ..........................F..F.............F [ 3%] F.FF..............................................................FFF... [ 8%] .... [ 8%] tests/unit/test_notebooks.py ...... [ 8%] tests/unit/test_s3.py FF [ 8%] tests/unit/test_tf4rec.py . [ 9%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 21%] ........................................s.. [ 24%] tests/unit/loader/test_torch_dataloader.py ............................. [ 26%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ........................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ..................... [ 47%] tests/unit/ops/test_hash_bucket.py ......................... [ 49%] tests/unit/ops/test_join.py ............................................ [ 52%] ........................................................................ [ 57%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 63%] .. [ 63%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py FFFFFF [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] .......................................................... [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]

=================================== FAILURES =================================== ____ test_dask_workflow_api_dlrm[True-None-True-device-0-csv-no-header-0.1] ____

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr26') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} freq_threshold = 0, part_mem_fraction = 0.1, engine = 'csv-no-header' cat_cache = 'device', on_host = True, shuffle = None, cpu = True

@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
    client,
    tmpdir,
    datasets,
    freq_threshold,
    part_mem_fraction,
    engine,
    cat_cache,
    on_host,
    shuffle,
    cpu,
):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    paths = sorted(paths)
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)
    df0 = df0.to_pandas() if cpu else df0

    if engine == "parquet":
        cat_names = ["name-cat", "name-string"]
    else:
        cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    cats = cat_names >> ops.Categorify(
        freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
    )

    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()

    workflow = Workflow(cats + conts + label_name)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
    else:
        dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)

    output_path = os.path.join(tmpdir, "processed")

    transformed = workflow.fit_transform(dataset)
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)

    result = transformed.to_ddf().compute()
    assert len(df0) == len(result)
    assert result["x"].min() == 0.0
    assert result["x"].isna().sum() == 0
    assert result["y"].min() == 0.0
    assert result["y"].isna().sum() == 0

    # Check categories.  Need to sort first to make sure we are comparing
    # "apples to apples"
    expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
    dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
        {"name-string_x": "count", "name-string_y": "count"}
    )
    if freq_threshold:
        dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
    assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)

    # Read back from disk
    if cpu:
      df_disk = dd_read_parquet(output_path).compute()

tests/unit/test_dask_nvt.py:130:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- 2022-08-09 08:05:58,808 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-59cbff4bfa9b201755371def3a4a8ee0', 1) Function: subgraph_callable-bc40cc8d-e6cc-44be-b032-79fc6a78 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr26/processed/part_1.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

___ test_dask_workflow_api_dlrm[True-None-True-device-150-csv-no-header-0.1] ___

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr29') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} freq_threshold = 150, part_mem_fraction = 0.1, engine = 'csv-no-header' cat_cache = 'device', on_host = True, shuffle = None, cpu = True

@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
    client,
    tmpdir,
    datasets,
    freq_threshold,
    part_mem_fraction,
    engine,
    cat_cache,
    on_host,
    shuffle,
    cpu,
):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    paths = sorted(paths)
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)
    df0 = df0.to_pandas() if cpu else df0

    if engine == "parquet":
        cat_names = ["name-cat", "name-string"]
    else:
        cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    cats = cat_names >> ops.Categorify(
        freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
    )

    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()

    workflow = Workflow(cats + conts + label_name)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
    else:
        dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)

    output_path = os.path.join(tmpdir, "processed")

    transformed = workflow.fit_transform(dataset)
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)

    result = transformed.to_ddf().compute()
    assert len(df0) == len(result)
    assert result["x"].min() == 0.0
    assert result["x"].isna().sum() == 0
    assert result["y"].min() == 0.0
    assert result["y"].isna().sum() == 0

    # Check categories.  Need to sort first to make sure we are comparing
    # "apples to apples"
    expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
    dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
        {"name-string_x": "count", "name-string_y": "count"}
    )
    if freq_threshold:
        dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
    assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)

    # Read back from disk
    if cpu:
      df_disk = dd_read_parquet(output_path).compute()

tests/unit/test_dask_nvt.py:130:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- 2022-08-09 08:06:00,845 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-49b3604576a4cafddded7a7f39db1cf7', 1) Function: subgraph_callable-0a7fa952-c38a-4751-be47-bec1bd5c args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr29/processed/part_1.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

_________ test_dask_workflow_api_dlrm[True-None-False-None-0-csv-0.1] __________

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr43') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} freq_threshold = 0, part_mem_fraction = 0.1, engine = 'csv', cat_cache = None on_host = False, shuffle = None, cpu = True

@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
    client,
    tmpdir,
    datasets,
    freq_threshold,
    part_mem_fraction,
    engine,
    cat_cache,
    on_host,
    shuffle,
    cpu,
):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    paths = sorted(paths)
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)
    df0 = df0.to_pandas() if cpu else df0

    if engine == "parquet":
        cat_names = ["name-cat", "name-string"]
    else:
        cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    cats = cat_names >> ops.Categorify(
        freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
    )

    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()

    workflow = Workflow(cats + conts + label_name)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
    else:
        dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)

    output_path = os.path.join(tmpdir, "processed")

    transformed = workflow.fit_transform(dataset)
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)

    result = transformed.to_ddf().compute()
    assert len(df0) == len(result)
    assert result["x"].min() == 0.0
    assert result["x"].isna().sum() == 0
    assert result["y"].min() == 0.0
    assert result["y"].isna().sum() == 0

    # Check categories.  Need to sort first to make sure we are comparing
    # "apples to apples"
    expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
    dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
        {"name-string_x": "count", "name-string_y": "count"}
    )
    if freq_threshold:
        dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
    assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)

    # Read back from disk
    if cpu:
      df_disk = dd_read_parquet(output_path).compute()

tests/unit/test_dask_nvt.py:130:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- 2022-08-09 08:06:08,690 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-10ebdc6558d21ad31bcfda98f39ea235', 0) Function: subgraph_callable-876b006a-b0d8-491d-8a31-2856bd56 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr43/processed/part_0.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

____ test_dask_workflow_api_dlrm[True-None-False-None-0-csv-no-header-0.1] _____

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr44') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} freq_threshold = 0, part_mem_fraction = 0.1, engine = 'csv-no-header' cat_cache = None, on_host = False, shuffle = None, cpu = True

@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
    client,
    tmpdir,
    datasets,
    freq_threshold,
    part_mem_fraction,
    engine,
    cat_cache,
    on_host,
    shuffle,
    cpu,
):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    paths = sorted(paths)
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)
    df0 = df0.to_pandas() if cpu else df0

    if engine == "parquet":
        cat_names = ["name-cat", "name-string"]
    else:
        cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    cats = cat_names >> ops.Categorify(
        freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
    )

    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()

    workflow = Workflow(cats + conts + label_name)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
    else:
        dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)

    output_path = os.path.join(tmpdir, "processed")

    transformed = workflow.fit_transform(dataset)
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)

    result = transformed.to_ddf().compute()
    assert len(df0) == len(result)
    assert result["x"].min() == 0.0
    assert result["x"].isna().sum() == 0
    assert result["y"].min() == 0.0
    assert result["y"].isna().sum() == 0

    # Check categories.  Need to sort first to make sure we are comparing
    # "apples to apples"
    expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
    dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
        {"name-string_x": "count", "name-string_y": "count"}
    )
    if freq_threshold:
        dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
    assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)

    # Read back from disk
    if cpu:
      df_disk = dd_read_parquet(output_path).compute()

tests/unit/test_dask_nvt.py:130:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- 2022-08-09 08:06:09,657 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-09ccef034e94de817977745e9bd95565', 1) Function: subgraph_callable-e0817284-0859-4f0b-b29f-6dbe624e args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr44/processed/part_1.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

________ test_dask_workflow_api_dlrm[True-None-False-None-150-csv-0.1] _________

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr46') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} freq_threshold = 150, part_mem_fraction = 0.1, engine = 'csv', cat_cache = None on_host = False, shuffle = None, cpu = True

@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
    client,
    tmpdir,
    datasets,
    freq_threshold,
    part_mem_fraction,
    engine,
    cat_cache,
    on_host,
    shuffle,
    cpu,
):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    paths = sorted(paths)
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)
    df0 = df0.to_pandas() if cpu else df0

    if engine == "parquet":
        cat_names = ["name-cat", "name-string"]
    else:
        cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    cats = cat_names >> ops.Categorify(
        freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
    )

    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()

    workflow = Workflow(cats + conts + label_name)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
    else:
        dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)

    output_path = os.path.join(tmpdir, "processed")

    transformed = workflow.fit_transform(dataset)
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)

    result = transformed.to_ddf().compute()
    assert len(df0) == len(result)
    assert result["x"].min() == 0.0
    assert result["x"].isna().sum() == 0
    assert result["y"].min() == 0.0
    assert result["y"].isna().sum() == 0

    # Check categories.  Need to sort first to make sure we are comparing
    # "apples to apples"
    expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
    dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
        {"name-string_x": "count", "name-string_y": "count"}
    )
    if freq_threshold:
        dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
    assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)

    # Read back from disk
    if cpu:
      df_disk = dd_read_parquet(output_path).compute()

tests/unit/test_dask_nvt.py:130:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- 2022-08-09 08:06:11,000 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-9e44775b0c668fe5556e30bbf3a5a58b', 0) Function: subgraph_callable-726c31d6-5b0f-4fd2-b62f-1be20712 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr46/processed/part_0.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

___ test_dask_workflow_api_dlrm[True-None-False-None-150-csv-no-header-0.1] ____

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr47') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} freq_threshold = 150, part_mem_fraction = 0.1, engine = 'csv-no-header' cat_cache = None, on_host = False, shuffle = None, cpu = True

@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
    client,
    tmpdir,
    datasets,
    freq_threshold,
    part_mem_fraction,
    engine,
    cat_cache,
    on_host,
    shuffle,
    cpu,
):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    paths = sorted(paths)
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)
    df0 = df0.to_pandas() if cpu else df0

    if engine == "parquet":
        cat_names = ["name-cat", "name-string"]
    else:
        cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    cats = cat_names >> ops.Categorify(
        freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
    )

    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()

    workflow = Workflow(cats + conts + label_name)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
    else:
        dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)

    output_path = os.path.join(tmpdir, "processed")

    transformed = workflow.fit_transform(dataset)
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)

    result = transformed.to_ddf().compute()
    assert len(df0) == len(result)
    assert result["x"].min() == 0.0
    assert result["x"].isna().sum() == 0
    assert result["y"].min() == 0.0
    assert result["y"].isna().sum() == 0

    # Check categories.  Need to sort first to make sure we are comparing
    # "apples to apples"
    expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
    dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
    dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
        {"name-string_x": "count", "name-string_y": "count"}
    )
    if freq_threshold:
        dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
    assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)

    # Read back from disk
    if cpu:
      df_disk = dd_read_parquet(output_path).compute()

tests/unit/test_dask_nvt.py:130:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- 2022-08-09 08:06:11,924 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-f624361c9960e8bfe9f17d1c64ec291a', 1) Function: subgraph_callable-a4dff5b3-44cd-4cd0-abd8-1d33decd args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr47/processed/part_1.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

___________________ test_dask_preproc_cpu[True-None-parquet] ___________________

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} engine = 'parquet', shuffle = None, cpu = True

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_preproc_cpu(client, tmpdir, datasets, engine, shuffle, cpu):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, part_size="1MB", cpu=cpu)
    else:
        dataset = Dataset(paths, names=allcols_csv, part_size="1MB", cpu=cpu)

    # Simple transform (normalize)
    cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]
    conts = cont_names >> ops.FillMissing() >> ops.Normalize()
    workflow = Workflow(conts + cat_names + label_name)
    transformed = workflow.fit_transform(dataset)

    # Write out dataset
    output_path = os.path.join(tmpdir, "processed")
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=4)

    # Check the final result
  df_disk = dd_read_parquet(output_path, engine="pyarrow").compute()

tests/unit/test_dask_nvt.py:277:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( 2022-08-09 08:06:52,615 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 0) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_0.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn( 2022-08-09 08:06:52,617 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 11) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_2.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,617 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 1) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_0.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,620 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 10) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_2.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,626 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 14) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_3.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,627 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 12) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_3.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,627 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 15) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_3.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,630 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 13) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_3.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

--------------------------- Captured stderr teardown --------------------------- 2022-08-09 08:06:52,633 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 2) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_0.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,671 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 3) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_0.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,674 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 4) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_1.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,675 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 5) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_1.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,684 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 7) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_1.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,693 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 6) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_1.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,694 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 8) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_2.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:52,694 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 9) Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_2.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

_____________________ test_dask_preproc_cpu[True-None-csv] _____________________

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} engine = 'csv', shuffle = None, cpu = True

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_preproc_cpu(client, tmpdir, datasets, engine, shuffle, cpu):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, part_size="1MB", cpu=cpu)
    else:
        dataset = Dataset(paths, names=allcols_csv, part_size="1MB", cpu=cpu)

    # Simple transform (normalize)
    cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]
    conts = cont_names >> ops.FillMissing() >> ops.Normalize()
    workflow = Workflow(conts + cat_names + label_name)
    transformed = workflow.fit_transform(dataset)

    # Write out dataset
    output_path = os.path.join(tmpdir, "processed")
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=4)

    # Check the final result
  df_disk = dd_read_parquet(output_path, engine="pyarrow").compute()

tests/unit/test_dask_nvt.py:277:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- 2022-08-09 08:06:53,332 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 20) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_5.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,333 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 13) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_3.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,337 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 19) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_4.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,339 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 16) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_4.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,340 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 14) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_3.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

--------------------------- Captured stderr teardown --------------------------- 2022-08-09 08:06:53,343 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 15) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_3.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,343 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 22) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_5.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,345 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 11) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_2.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,346 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 18) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_4.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,351 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 12) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_3.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,352 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 17) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_4.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,354 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 10) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_2.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,357 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 21) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_5.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,362 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 26) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_6.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,365 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 24) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_6.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,373 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 28) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_7.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,374 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 31) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_7.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,375 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 23) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_5.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,376 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 25) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_6.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,382 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 27) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_6.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,388 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 29) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_7.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:53,388 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-c93258fabc7094400b097695615335f6', 30) Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_7.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

________________ test_dask_preproc_cpu[True-None-csv-no-header] ________________

client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB> tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2') datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} engine = 'csv-no-header', shuffle = None, cpu = True

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_preproc_cpu(client, tmpdir, datasets, engine, shuffle, cpu):
    set_dask_client(client=client)
    paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
    if engine == "parquet":
        df1 = cudf.read_parquet(paths[0])[mycols_pq]
        df2 = cudf.read_parquet(paths[1])[mycols_pq]
    elif engine == "csv":
        df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
        df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
    else:
        df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
        df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
    df0 = cudf.concat([df1, df2], axis=0)

    if engine in ("parquet", "csv"):
        dataset = Dataset(paths, part_size="1MB", cpu=cpu)
    else:
        dataset = Dataset(paths, names=allcols_csv, part_size="1MB", cpu=cpu)

    # Simple transform (normalize)
    cat_names = ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]
    conts = cont_names >> ops.FillMissing() >> ops.Normalize()
    workflow = Workflow(conts + cat_names + label_name)
    transformed = workflow.fit_transform(dataset)

    # Write out dataset
    output_path = os.path.join(tmpdir, "processed")
    transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=4)

    # Check the final result
  df_disk = dd_read_parquet(output_path, engine="pyarrow").compute()

tests/unit/test_dask_nvt.py:277:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute (result,) = compute(self, traverse=False, **kwargs) /usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute results = schedule(dsk, keys, **kwargs) /usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get results = self.gather(packed, asynchronous=asynchronous, direct=direct) /usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather return self.sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync return sync( /usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync raise exc.with_traceback(tb) /usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f result = yield future /usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run value = future.result() /usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather raise exception.with_traceback(traceback) /usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args))) /usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get result = _execute_task(task, cache) /usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task return func(*(_execute_task(a, cache) for a in args)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call return read_parquet_part( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part dfs = [ /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw)) /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition arrow_table = cls._read_table( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table arrow_table = _read_table_from_path( /usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path return pq.ParquetFile(fil).read_row_groups( /usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init self.reader.open( pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open ???


??? E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

pyarrow/error.pxi:99: ArrowInvalid ----------------------------- Captured stderr call ----------------------------- 2022-08-09 08:06:54,051 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-d37d6baf70f240c2c93439272f3e810b', 17) Function: subgraph_callable-bde5a890-3c1f-4199-88c5-55e55f29 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2/processed/part_4.parquet', [1], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:54,052 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-d37d6baf70f240c2c93439272f3e810b', 19) Function: subgraph_callable-bde5a890-3c1f-4199-88c5-55e55f29 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2/processed/part_4.parquet', [3], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:54,052 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-d37d6baf70f240c2c93439272f3e810b', 20) Function: subgraph_callable-bde5a890-3c1f-4199-88c5-55e55f29 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2/processed/part_5.parquet', [0], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

2022-08-09 08:06:54,053 - distributed.worker - WARNING - Compute Failed Key: ('read-parquet-d37d6baf70f240c2c93439272f3e810b', 22) Function: subgraph_callable-bde5a890-3c1f-4199-88c5-55e55f29 args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2/processed/part_5.parquet', [2], [])}) kwargs: {} Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"

___________________________ test_s3_dataset[parquet] ___________________________

self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>

def _new_conn(self):
    """ Establish a socket connection and set nodelay settings on it.

    :return: New socket connection.
    """
    extra_kw = {}
    if self.source_address:
        extra_kw["source_address"] = self.source_address

    if self.socket_options:
        extra_kw["socket_options"] = self.socket_options

    try:
      conn = connection.create_connection(
            (self._dns_host, self.port), self.timeout, **extra_kw
        )

/usr/lib/python3/dist-packages/urllib3/connection.py:159:


address = ('127.0.0.1', 5000), timeout = 60, source_address = None socket_options = [(6, 1, 1)]

def create_connection(
    address,
    timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
    source_address=None,
    socket_options=None,
):
    """Connect to *address* and return the socket object.

    Convenience function.  Connect to *address* (a 2-tuple ``(host,
    port)``) and return the socket object.  Passing the optional
    *timeout* parameter will set the timeout on the socket instance
    before attempting to connect.  If no *timeout* is supplied, the
    global default timeout setting returned by :func:`getdefaulttimeout`
    is used.  If *source_address* is set it must be a tuple of (host, port)
    for the socket to bind as a source address before making the connection.
    An host of '' or port 0 tells the OS to use the default.
    """

    host, port = address
    if host.startswith("["):
        host = host.strip("[]")
    err = None

    # Using the value from allowed_gai_family() in the context of getaddrinfo lets
    # us select whether to work with IPv4 DNS records, IPv6 records, or both.
    # The original create_connection function always returns all records.
    family = allowed_gai_family()

    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = None
        try:
            sock = socket.socket(af, socktype, proto)

            # If provided, set socket level options before connecting.
            _set_socket_options(sock, socket_options)

            if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
                sock.settimeout(timeout)
            if source_address:
                sock.bind(source_address)
            sock.connect(sa)
            return sock

        except socket.error as e:
            err = e
            if sock is not None:
                sock.close()
                sock = None

    if err is not None:
      raise err

/usr/lib/python3/dist-packages/urllib3/util/connection.py:84:


address = ('127.0.0.1', 5000), timeout = 60, source_address = None socket_options = [(6, 1, 1)]

def create_connection(
    address,
    timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
    source_address=None,
    socket_options=None,
):
    """Connect to *address* and return the socket object.

    Convenience function.  Connect to *address* (a 2-tuple ``(host,
    port)``) and return the socket object.  Passing the optional
    *timeout* parameter will set the timeout on the socket instance
    before attempting to connect.  If no *timeout* is supplied, the
    global default timeout setting returned by :func:`getdefaulttimeout`
    is used.  If *source_address* is set it must be a tuple of (host, port)
    for the socket to bind as a source address before making the connection.
    An host of '' or port 0 tells the OS to use the default.
    """

    host, port = address
    if host.startswith("["):
        host = host.strip("[]")
    err = None

    # Using the value from allowed_gai_family() in the context of getaddrinfo lets
    # us select whether to work with IPv4 DNS records, IPv6 records, or both.
    # The original create_connection function always returns all records.
    family = allowed_gai_family()

    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = None
        try:
            sock = socket.socket(af, socktype, proto)

            # If provided, set socket level options before connecting.
            _set_socket_options(sock, socket_options)

            if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
                sock.settimeout(timeout)
            if source_address:
                sock.bind(source_address)
          sock.connect(sa)

E ConnectionRefusedError: [Errno 111] Connection refused

/usr/lib/python3/dist-packages/urllib3/util/connection.py:74: ConnectionRefusedError

During handling of the above exception, another exception occurred:

self = <botocore.httpsession.URLLib3Session object at 0x7fbad6651cd0> request = <AWSPreparedRequest stream_output=False, method=PUT, url=http://127.0.0.1:5000/parquet, headers={'x-amz-acl': b'public...nvocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}>

def send(self, request):
    try:
        proxy_url = self._proxy_config.proxy_url_for(request.url)
        manager = self._get_connection_manager(request.url, proxy_url)
        conn = manager.connection_from_url(request.url)
        self._setup_ssl_cert(conn, request.url, self._verify)
        if ensure_boolean(
            os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', '')
        ):
            # This is currently an "experimental" feature which provides
            # no guarantees of backwards compatibility. It may be subject
            # to change or removal in any patch version. Anyone opting in
            # to this feature should strictly pin botocore.
            host = urlparse(request.url).hostname
            conn.proxy_headers['host'] = host

        request_target = self._get_request_target(request.url, proxy_url)
      urllib_response = conn.urlopen(
            method=request.method,
            url=request_target,
            body=request.body,
            headers=request.headers,
            retries=Retry(False),
            assert_same_host=False,
            preload_content=False,
            decode_content=False,
            chunked=self._chunked(request.headers),
        )

/usr/local/lib/python3.8/dist-packages/botocore/httpsession.py:448:


self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0e7705e0> method = 'PUT', url = '/parquet', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'} retries = Retry(total=False, connect=None, read=None, redirect=0, status=None) redirect = True, assert_same_host = False timeout = <object object at 0x7fbbe1452220>, pool_timeout = None release_conn = False, chunked = False, body_pos = None response_kw = {'decode_content': False, 'preload_content': False}, conn = None release_this_conn = True, err = None, clean_exit = False timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbb0e6d53d0> is_new_proxy_conn = False

def urlopen(
    self,
    method,
    url,
    body=None,
    headers=None,
    retries=None,
    redirect=True,
    assert_same_host=True,
    timeout=_Default,
    pool_timeout=None,
    release_conn=None,
    chunked=False,
    body_pos=None,
    **response_kw
):
    """
    Get a connection from the pool and perform an HTTP request. This is the
    lowest level call for making a request, so you'll need to specify all
    the raw details.

    .. note::

       More commonly, it's appropriate to use a convenience method provided
       by :class:`.RequestMethods`, such as :meth:`request`.

    .. note::

       `release_conn` will only behave as expected if
       `preload_content=False` because we want to make
       `preload_content=False` the default behaviour someday soon without
       breaking backwards compatibility.

    :param method:
        HTTP request method (such as GET, POST, PUT, etc.)

    :param body:
        Data to send in the request body (useful for creating
        POST requests, see HTTPConnectionPool.post_url for
        more convenience).

    :param headers:
        Dictionary of custom headers to send, such as User-Agent,
        If-None-Match, etc. If None, pool headers are used. If provided,
        these headers completely replace any pool-specific headers.

    :param retries:
        Configure the number of retries to allow before raising a
        :class:`~urllib3.exceptions.MaxRetryError` exception.

        Pass ``None`` to retry until you receive a response. Pass a
        :class:`~urllib3.util.retry.Retry` object for fine-grained control
        over different types of retries.
        Pass an integer number to retry connection errors that many times,
        but no other types of errors. Pass zero to never retry.

        If ``False``, then retries are disabled and any exception is raised
        immediately. Also, instead of raising a MaxRetryError on redirects,
        the redirect response will be returned.

    :type retries: :class:`~urllib3.util.retry.Retry`, False, or an int.

    :param redirect:
        If True, automatically handle redirects (status codes 301, 302,
        303, 307, 308). Each redirect counts as a retry. Disabling retries
        will disable redirect, too.

    :param assert_same_host:
        If ``True``, will make sure that the host of the pool requests is
        consistent else will raise HostChangedError. When False, you can
        use the pool on an HTTP proxy and request foreign hosts.

    :param timeout:
        If specified, overrides the default timeout for this one
        request. It may be a float (in seconds) or an instance of
        :class:`urllib3.util.Timeout`.

    :param pool_timeout:
        If set and the pool is set to block=True, then this method will
        block for ``pool_timeout`` seconds and raise EmptyPoolError if no
        connection is available within the time period.

    :param release_conn:
        If False, then the urlopen call will not release the connection
        back into the pool once a response is received (but will release if
        you read the entire contents of the response such as when
        `preload_content=True`). This is useful if you're not preloading
        the response's content immediately. You will need to call
        ``r.release_conn()`` on the response ``r`` to return the connection
        back into the pool. If None, it takes the value of
        ``response_kw.get('preload_content', True)``.

    :param chunked:
        If True, urllib3 will send the body using chunked transfer
        encoding. Otherwise, urllib3 will send the body using the standard
        content-length form. Defaults to False.

    :param int body_pos:
        Position to seek to in file-like body in the event of a retry or
        redirect. Typically this won't need to be set because urllib3 will
        auto-populate the value when needed.

    :param \\**response_kw:
        Additional parameters are passed to
        :meth:`urllib3.response.HTTPResponse.from_httplib`
    """
    if headers is None:
        headers = self.headers

    if not isinstance(retries, Retry):
        retries = Retry.from_int(retries, redirect=redirect, default=self.retries)

    if release_conn is None:
        release_conn = response_kw.get("preload_content", True)

    # Check host
    if assert_same_host and not self.is_same_host(url):
        raise HostChangedError(self, url, retries)

    # Ensure that the URL we're connecting to is properly encoded
    if url.startswith("/"):
        url = six.ensure_str(_encode_target(url))
    else:
        url = six.ensure_str(parse_url(url).url)

    conn = None

    # Track whether `conn` needs to be released before
    # returning/raising/recursing. Update this variable if necessary, and
    # leave `release_conn` constant throughout the function. That way, if
    # the function recurses, the original value of `release_conn` will be
    # passed down into the recursive call, and its value will be respected.
    #
    # See issue #651 [1] for details.
    #
    # [1] <https://github.com/urllib3/urllib3/issues/651>
    release_this_conn = release_conn

    # Merge the proxy headers. Only do this in HTTP. We have to copy the
    # headers dict so we can safely change it without those changes being
    # reflected in anyone else's copy.
    if self.scheme == "http":
        headers = headers.copy()
        headers.update(self.proxy_headers)

    # Must keep the exception bound to a separate variable or else Python 3
    # complains about UnboundLocalError.
    err = None

    # Keep track of whether we cleanly exited the except block. This
    # ensures we do proper cleanup in finally.
    clean_exit = False

    # Rewind body position, if needed. Record current position
    # for future rewinds in the event of a redirect/retry.
    body_pos = set_file_position(body, body_pos)

    try:
        # Request a connection from the queue.
        timeout_obj = self._get_timeout(timeout)
        conn = self._get_conn(timeout=pool_timeout)

        conn.timeout = timeout_obj.connect_timeout

        is_new_proxy_conn = self.proxy is not None and not getattr(
            conn, "sock", None
        )
        if is_new_proxy_conn:
            self._prepare_proxy(conn)

        # Make the request on the httplib connection object.
        httplib_response = self._make_request(
            conn,
            method,
            url,
            timeout=timeout_obj,
            body=body,
            headers=headers,
            chunked=chunked,
        )

        # If we're going to release the connection in ``finally:``, then
        # the response doesn't need to know about the connection. Otherwise
        # it will also try to release it and we'll have a double-release
        # mess.
        response_conn = conn if not release_conn else None

        # Pass method to Response for length checking
        response_kw["request_method"] = method

        # Import httplib's response into our own wrapper object
        response = self.ResponseCls.from_httplib(
            httplib_response,
            pool=self,
            connection=response_conn,
            retries=retries,
            **response_kw
        )

        # Everything went great!
        clean_exit = True

    except queue.Empty:
        # Timed out by queue.
        raise EmptyPoolError(self, "No pool connections are available.")

    except (
        TimeoutError,
        HTTPException,
        SocketError,
        ProtocolError,
        BaseSSLError,
        SSLError,
        CertificateError,
    ) as e:
        # Discard the connection for these exceptions. It will be
        # replaced during the next _get_conn() call.
        clean_exit = False
        if isinstance(e, (BaseSSLError, CertificateError)):
            e = SSLError(e)
        elif isinstance(e, (SocketError, NewConnectionError)) and self.proxy:
            e = ProxyError("Cannot connect to proxy.", e)
        elif isinstance(e, (SocketError, HTTPException)):
            e = ProtocolError("Connection aborted.", e)
      retries = retries.increment(
            method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
        )

/usr/lib/python3/dist-packages/urllib3/connectionpool.py:719:


self = Retry(total=False, connect=None, read=None, redirect=0, status=None) method = 'PUT', url = '/parquet', response = None error = NewConnectionError('<botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>: Failed to establish a new connection: [Errno 111] Connection refused') _pool = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0e7705e0> _stacktrace = <traceback object at 0x7fbad6d6e840>

def increment(
    self,
    method=None,
    url=None,
    response=None,
    error=None,
    _pool=None,
    _stacktrace=None,
):
    """ Return a new Retry object with incremented retry counters.

    :param response: A response object, or None, if the server did not
        return a response.
    :type response: :class:`~urllib3.response.HTTPResponse`
    :param Exception error: An error encountered during the request, or
        None if the response was received successfully.

    :return: A new ``Retry`` object.
    """
    if self.total is False and error:
        # Disabled, indicate to re-raise the error.
      raise six.reraise(type(error), error, _stacktrace)

/usr/lib/python3/dist-packages/urllib3/util/retry.py:376:


tp = <class 'urllib3.exceptions.NewConnectionError'>, value = None, tb = None

def reraise(tp, value, tb=None):
    try:
        if value is None:
            value = tp()
        if value.__traceback__ is not tb:
            raise value.with_traceback(tb)
      raise value

../../../.local/lib/python3.8/site-packages/six.py:703:


self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0e7705e0> method = 'PUT', url = '/parquet', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'} retries = Retry(total=False, connect=None, read=None, redirect=0, status=None) redirect = True, assert_same_host = False timeout = <object object at 0x7fbbe1452220>, pool_timeout = None release_conn = False, chunked = False, body_pos = None response_kw = {'decode_content': False, 'preload_content': False}, conn = None release_this_conn = True, err = None, clean_exit = False timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbb0e6d53d0> is_new_proxy_conn = False

def urlopen(
    self,
    method,
    url,
    body=None,
    headers=None,
    retries=None,
    redirect=True,
    assert_same_host=True,
    timeout=_Default,
    pool_timeout=None,
    release_conn=None,
    chunked=False,
    body_pos=None,
    **response_kw
):
    """
    Get a connection from the pool and perform an HTTP request. This is the
    lowest level call for making a request, so you'll need to specify all
    the raw details.

    .. note::

       More commonly, it's appropriate to use a convenience method provided
       by :class:`.RequestMethods`, such as :meth:`request`.

    .. note::

       `release_conn` will only behave as expected if
       `preload_content=False` because we want to make
       `preload_content=False` the default behaviour someday soon without
       breaking backwards compatibility.

    :param method:
        HTTP request method (such as GET, POST, PUT, etc.)

    :param body:
        Data to send in the request body (useful for creating
        POST requests, see HTTPConnectionPool.post_url for
        more convenience).

    :param headers:
        Dictionary of custom headers to send, such as User-Agent,
        If-None-Match, etc. If None, pool headers are used. If provided,
        these headers completely replace any pool-specific headers.

    :param retries:
        Configure the number of retries to allow before raising a
        :class:`~urllib3.exceptions.MaxRetryError` exception.

        Pass ``None`` to retry until you receive a response. Pass a
        :class:`~urllib3.util.retry.Retry` object for fine-grained control
        over different types of retries.
        Pass an integer number to retry connection errors that many times,
        but no other types of errors. Pass zero to never retry.

        If ``False``, then retries are disabled and any exception is raised
        immediately. Also, instead of raising a MaxRetryError on redirects,
        the redirect response will be returned.

    :type retries: :class:`~urllib3.util.retry.Retry`, False, or an int.

    :param redirect:
        If True, automatically handle redirects (status codes 301, 302,
        303, 307, 308). Each redirect counts as a retry. Disabling retries
        will disable redirect, too.

    :param assert_same_host:
        If ``True``, will make sure that the host of the pool requests is
        consistent else will raise HostChangedError. When False, you can
        use the pool on an HTTP proxy and request foreign hosts.

    :param timeout:
        If specified, overrides the default timeout for this one
        request. It may be a float (in seconds) or an instance of
        :class:`urllib3.util.Timeout`.

    :param pool_timeout:
        If set and the pool is set to block=True, then this method will
        block for ``pool_timeout`` seconds and raise EmptyPoolError if no
        connection is available within the time period.

    :param release_conn:
        If False, then the urlopen call will not release the connection
        back into the pool once a response is received (but will release if
        you read the entire contents of the response such as when
        `preload_content=True`). This is useful if you're not preloading
        the response's content immediately. You will need to call
        ``r.release_conn()`` on the response ``r`` to return the connection
        back into the pool. If None, it takes the value of
        ``response_kw.get('preload_content', True)``.

    :param chunked:
        If True, urllib3 will send the body using chunked transfer
        encoding. Otherwise, urllib3 will send the body using the standard
        content-length form. Defaults to False.

    :param int body_pos:
        Position to seek to in file-like body in the event of a retry or
        redirect. Typically this won't need to be set because urllib3 will
        auto-populate the value when needed.

    :param \\**response_kw:
        Additional parameters are passed to
        :meth:`urllib3.response.HTTPResponse.from_httplib`
    """
    if headers is None:
        headers = self.headers

    if not isinstance(retries, Retry):
        retries = Retry.from_int(retries, redirect=redirect, default=self.retries)

    if release_conn is None:
        release_conn = response_kw.get("preload_content", True)

    # Check host
    if assert_same_host and not self.is_same_host(url):
        raise HostChangedError(self, url, retries)

    # Ensure that the URL we're connecting to is properly encoded
    if url.startswith("/"):
        url = six.ensure_str(_encode_target(url))
    else:
        url = six.ensure_str(parse_url(url).url)

    conn = None

    # Track whether `conn` needs to be released before
    # returning/raising/recursing. Update this variable if necessary, and
    # leave `release_conn` constant throughout the function. That way, if
    # the function recurses, the original value of `release_conn` will be
    # passed down into the recursive call, and its value will be respected.
    #
    # See issue #651 [1] for details.
    #
    # [1] <https://github.com/urllib3/urllib3/issues/651>
    release_this_conn = release_conn

    # Merge the proxy headers. Only do this in HTTP. We have to copy the
    # headers dict so we can safely change it without those changes being
    # reflected in anyone else's copy.
    if self.scheme == "http":
        headers = headers.copy()
        headers.update(self.proxy_headers)

    # Must keep the exception bound to a separate variable or else Python 3
    # complains about UnboundLocalError.
    err = None

    # Keep track of whether we cleanly exited the except block. This
    # ensures we do proper cleanup in finally.
    clean_exit = False

    # Rewind body position, if needed. Record current position
    # for future rewinds in the event of a redirect/retry.
    body_pos = set_file_position(body, body_pos)

    try:
        # Request a connection from the queue.
        timeout_obj = self._get_timeout(timeout)
        conn = self._get_conn(timeout=pool_timeout)

        conn.timeout = timeout_obj.connect_timeout

        is_new_proxy_conn = self.proxy is not None and not getattr(
            conn, "sock", None
        )
        if is_new_proxy_conn:
            self._prepare_proxy(conn)

        # Make the request on the httplib connection object.
      httplib_response = self._make_request(
            conn,
            method,
            url,
            timeout=timeout_obj,
            body=body,
            headers=headers,
            chunked=chunked,
        )

/usr/lib/python3/dist-packages/urllib3/connectionpool.py:665:


self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0e7705e0> conn = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0> method = 'PUT', url = '/parquet' timeout = <urllib3.util.timeout.Timeout object at 0x7fbb0e6d53d0> chunked = False httplib_request_kw = {'body': None, 'headers': {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-...nvocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}} timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbb0e770040>

def _make_request(
    self, conn, method, url, timeout=_Default, chunked=False, **httplib_request_kw
):
    """
    Perform a request on a given urllib connection object taken from our
    pool.

    :param conn:
        a connection from one of our connection pools

    :param timeout:
        Socket timeout in seconds for the request. This can be a
        float or integer, which will set the same timeout value for
        the socket connect and the socket read, or an instance of
        :class:`urllib3.util.Timeout`, which gives you more fine-grained
        control over your timeouts.
    """
    self.num_requests += 1

    timeout_obj = self._get_timeout(timeout)
    timeout_obj.start_connect()
    conn.timeout = timeout_obj.connect_timeout

    # Trigger any extra validation we need to do.
    try:
        self._validate_conn(conn)
    except (SocketTimeout, BaseSSLError) as e:
        # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
        self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
        raise

    # conn.request() calls httplib.*.request, not the method in
    # urllib3.request. It also calls makefile (recv) on the socket.
    if chunked:
        conn.request_chunked(method, url, **httplib_request_kw)
    else:
      conn.request(method, url, **httplib_request_kw)

/usr/lib/python3/dist-packages/urllib3/connectionpool.py:387:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0> method = 'PUT', url = '/parquet', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}

def request(self, method, url, body=None, headers={}, *,
            encode_chunked=False):
    """Send a complete request to the server."""
  self._send_request(method, url, body, headers, encode_chunked)

/usr/lib/python3.8/http/client.py:1256:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0> method = 'PUT', url = '/parquet', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'} args = (False,), kwargs = {}

def _send_request(self, method, url, body, headers, *args, **kwargs):
    self._response_received = False
    if headers.get('Expect', b'') == b'100-continue':
        self._expect_header_set = True
    else:
        self._expect_header_set = False
        self.response_class = self._original_response_cls
  rval = super()._send_request(
        method, url, body, headers, *args, **kwargs
    )

/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:94:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0> method = 'PUT', url = '/parquet', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'} encode_chunked = False

def _send_request(self, method, url, body, headers, encode_chunked):
    # Honor explicitly requested Host: and Accept-Encoding: headers.
    header_names = frozenset(k.lower() for k in headers)
    skips = {}
    if 'host' in header_names:
        skips['skip_host'] = 1
    if 'accept-encoding' in header_names:
        skips['skip_accept_encoding'] = 1

    self.putrequest(method, url, **skips)

    # chunked encoding will happen if HTTP/1.1 is used and either
    # the caller passes encode_chunked=True or the following
    # conditions hold:
    # 1. content-length has not been explicitly set
    # 2. the body is a file or iterable, but not a str or bytes-like
    # 3. Transfer-Encoding has NOT been explicitly set by the caller

    if 'content-length' not in header_names:
        # only chunk body if not explicitly set for backwards
        # compatibility, assuming the client code is already handling the
        # chunking
        if 'transfer-encoding' not in header_names:
            # if content-length cannot be automatically determined, fall
            # back to chunked encoding
            encode_chunked = False
            content_length = self._get_content_length(body, method)
            if content_length is None:
                if body is not None:
                    if self.debuglevel > 0:
                        print('Unable to determine size of %r' % body)
                    encode_chunked = True
                    self.putheader('Transfer-Encoding', 'chunked')
            else:
                self.putheader('Content-Length', str(content_length))
    else:
        encode_chunked = False

    for hdr, value in headers.items():
        self.putheader(hdr, value)
    if isinstance(body, str):
        # RFC 2616 Section 3.7.1 says that text default has a
        # default charset of iso-8859-1.
        body = _encode(body, 'body')
  self.endheaders(body, encode_chunked=encode_chunked)

/usr/lib/python3.8/http/client.py:1302:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0> message_body = None

def endheaders(self, message_body=None, *, encode_chunked=False):
    """Indicate that the last header line has been sent to the server.

    This method sends the request to the server.  The optional message_body
    argument can be used to pass a message body associated with the
    request.
    """
    if self.__state == _CS_REQ_STARTED:
        self.__state = _CS_REQ_SENT
    else:
        raise CannotSendHeader()
  self._send_output(message_body, encode_chunked=encode_chunked)

/usr/lib/python3.8/http/client.py:1251:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0> message_body = None, args = (), kwargs = {'encode_chunked': False} msg = b'PUT /parquet HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-A...-invocation-id: 5b41a982-e65e-407f-93da-29b3a02c5d15\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'

def _send_output(self, message_body=None, *args, **kwargs):
    self._buffer.extend((b"", b""))
    msg = self._convert_to_bytes(self._buffer)
    del self._buffer[:]
    # If msg and message_body are sent in a single send() call,
    # it will avoid performance problems caused by the interaction
    # between delayed ack and the Nagle algorithm.
    if isinstance(message_body, bytes):
        msg += message_body
        message_body = None
  self.send(msg)

/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:123:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0> str = b'PUT /parquet HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-A...-invocation-id: 5b41a982-e65e-407f-93da-29b3a02c5d15\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'

def send(self, str):
    if self._response_received:
        logger.debug(
            "send() called, but reseponse already received. "
            "Not sending data."
        )
        return
  return super().send(str)

/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:218:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0> data = b'PUT /parquet HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-A...-invocation-id: 5b41a982-e65e-407f-93da-29b3a02c5d15\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'

def send(self, data):
    """Send `data' to the server.
    ``data`` can be a string object, a bytes object, an array object, a
    file-like object that supports a .read() method, or an iterable object.
    """

    if self.sock is None:
        if self.auto_open:
          self.connect()

/usr/lib/python3.8/http/client.py:951:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>

def connect(self):
  conn = self._new_conn()

/usr/lib/python3/dist-packages/urllib3/connection.py:187:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>

def _new_conn(self):
    """ Establish a socket connection and set nodelay settings on it.

    :return: New socket connection.
    """
    extra_kw = {}
    if self.source_address:
        extra_kw["source_address"] = self.source_address

    if self.socket_options:
        extra_kw["socket_options"] = self.socket_options

    try:
        conn = connection.create_connection(
            (self._dns_host, self.port), self.timeout, **extra_kw
        )

    except SocketTimeout:
        raise ConnectTimeoutError(
            self,
            "Connection to %s timed out. (connect timeout=%s)"
            % (self.host, self.timeout),
        )

    except SocketError as e:
      raise NewConnectionError(
            self, "Failed to establish a new connection: %s" % e
        )

E urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>: Failed to establish a new connection: [Errno 111] Connection refused

/usr/lib/python3/dist-packages/urllib3/connection.py:171: NewConnectionError

During handling of the above exception, another exception occurred:

s3_base = 'http://127.0.0.1:5000/' s3so = {'client_kwargs': {'endpoint_url': 'http://127.0.0.1:5000/'}} paths = ['/tmp/pytest-of-jenkins/pytest-14/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-14/parquet0/dataset-1.parquet'] datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} engine = 'parquet' df = name-cat name-string id label x y 0 Edith Laura 1054 964 -0.792165 0.069362 ...da 976 964 -0.270133 0.839677 4320 Alice Ray 967 977 0.033737 -0.727091

[4321 rows x 6 columns] patch_aiobotocore = None

@pytest.mark.parametrize("engine", ["parquet", "csv"])
def test_s3_dataset(s3_base, s3so, paths, datasets, engine, df, patch_aiobotocore):
    # Copy files to mock s3 bucket
    files = {}
    for i, path in enumerate(paths):
        with open(path, "rb") as f:
            fbytes = f.read()
        fn = path.split(os.path.sep)[-1]
        files[fn] = BytesIO()
        files[fn].write(fbytes)
        files[fn].seek(0)

    if engine == "parquet":
        # Workaround for nvt#539. In order to avoid the
        # bug in Dask's `create_metadata_file`, we need
        # to manually generate a "_metadata" file here.
        # This can be removed after dask#7295 is merged
        # (see https://github.com/dask/dask/pull/7295)
        fn = "_metadata"
        files[fn] = BytesIO()
        meta = create_metadata_file(
            paths,
            engine="pyarrow",
            out_dir=False,
        )
        meta.write_metadata_file(files[fn])
        files[fn].seek(0)
  with s3_context(s3_base=s3_base, bucket=engine, files=files) as s3fs:

tests/unit/test_s3.py:97:


/usr/lib/python3.8/contextlib.py:113: in enter return next(self.gen) /usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:96: in s3_context client.create_bucket(Bucket=bucket, ACL="public-read-write") /usr/local/lib/python3.8/dist-packages/botocore/client.py:508: in _api_call return self._make_api_call(operation_name, kwargs) /usr/local/lib/python3.8/dist-packages/botocore/client.py:898: in _make_api_call http, parsed_response = self._make_request( /usr/local/lib/python3.8/dist-packages/botocore/client.py:921: in _make_request return self._endpoint.make_request(operation_model, request_dict) /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:119: in make_request return self._send_request(request_dict, operation_model) /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:202: in _send_request while self._needs_retry( /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:354: in _needs_retry responses = self._event_emitter.emit( /usr/local/lib/python3.8/dist-packages/botocore/hooks.py:412: in emit return self._emitter.emit(aliased_event_name, **kwargs) /usr/local/lib/python3.8/dist-packages/botocore/hooks.py:256: in emit return self._emit(event_name, kwargs) /usr/local/lib/python3.8/dist-packages/botocore/hooks.py:239: in _emit response = handler(**kwargs) /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:207: in call if self._checker(**checker_kwargs): /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:284: in call should_retry = self._should_retry( /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:320: in _should_retry return self._checker(attempt_number, response, caught_exception) /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:363: in call checker_response = checker( /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:247: in call return self._check_caught_exception( /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:416: in _check_caught_exception raise caught_exception /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:281: in _do_get_response http_response = self._send(request) /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:377: in _send return self.http_session.send(request)


self = <botocore.httpsession.URLLib3Session object at 0x7fbad6651cd0> request = <AWSPreparedRequest stream_output=False, method=PUT, url=http://127.0.0.1:5000/parquet, headers={'x-amz-acl': b'public...nvocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}>

def send(self, request):
    try:
        proxy_url = self._proxy_config.proxy_url_for(request.url)
        manager = self._get_connection_manager(request.url, proxy_url)
        conn = manager.connection_from_url(request.url)
        self._setup_ssl_cert(conn, request.url, self._verify)
        if ensure_boolean(
            os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', '')
        ):
            # This is currently an "experimental" feature which provides
            # no guarantees of backwards compatibility. It may be subject
            # to change or removal in any patch version. Anyone opting in
            # to this feature should strictly pin botocore.
            host = urlparse(request.url).hostname
            conn.proxy_headers['host'] = host

        request_target = self._get_request_target(request.url, proxy_url)
        urllib_response = conn.urlopen(
            method=request.method,
            url=request_target,
            body=request.body,
            headers=request.headers,
            retries=Retry(False),
            assert_same_host=False,
            preload_content=False,
            decode_content=False,
            chunked=self._chunked(request.headers),
        )

        http_response = botocore.awsrequest.AWSResponse(
            request.url,
            urllib_response.status,
            urllib_response.headers,
            urllib_response,
        )

        if not request.stream_output:
            # Cause the raw stream to be exhausted immediately. We do it
            # this way instead of using preload_content because
            # preload_content will never buffer chunked responses
            http_response.content

        return http_response
    except URLLib3SSLError as e:
        raise SSLError(endpoint_url=request.url, error=e)
    except (NewConnectionError, socket.gaierror) as e:
      raise EndpointConnectionError(endpoint_url=request.url, error=e)

E botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5000/parquet"

/usr/local/lib/python3.8/dist-packages/botocore/httpsession.py:477: EndpointConnectionError ---------------------------- Captured stderr setup ----------------------------- Traceback (most recent call last): File "/usr/local/bin/moto_server", line 5, in from moto.server import main File "/usr/local/lib/python3.8/dist-packages/moto/server.py", line 7, in from moto.moto_server.werkzeug_app import ( File "/usr/local/lib/python3.8/dist-packages/moto/moto_server/werkzeug_app.py", line 6, in from flask import Flask File "/usr/local/lib/python3.8/dist-packages/flask/init.py", line 4, in from . import json as json File "/usr/local/lib/python3.8/dist-packages/flask/json/init.py", line 8, in from ..globals import current_app File "/usr/local/lib/python3.8/dist-packages/flask/globals.py", line 56, in app_ctx: "AppContext" = LocalProxy( # type: ignore[assignment] TypeError: init() got an unexpected keyword argument 'unbound_message' _____________________________ test_s3_dataset[csv] _____________________________

self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>

def _new_conn(self):
    """ Establish a socket connection and set nodelay settings on it.

    :return: New socket connection.
    """
    extra_kw = {}
    if self.source_address:
        extra_kw["source_address"] = self.source_address

    if self.socket_options:
        extra_kw["socket_options"] = self.socket_options

    try:
      conn = connection.create_connection(
            (self._dns_host, self.port), self.timeout, **extra_kw
        )

/usr/lib/python3/dist-packages/urllib3/connection.py:159:


address = ('127.0.0.1', 5000), timeout = 60, source_address = None socket_options = [(6, 1, 1)]

def create_connection(
    address,
    timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
    source_address=None,
    socket_options=None,
):
    """Connect to *address* and return the socket object.

    Convenience function.  Connect to *address* (a 2-tuple ``(host,
    port)``) and return the socket object.  Passing the optional
    *timeout* parameter will set the timeout on the socket instance
    before attempting to connect.  If no *timeout* is supplied, the
    global default timeout setting returned by :func:`getdefaulttimeout`
    is used.  If *source_address* is set it must be a tuple of (host, port)
    for the socket to bind as a source address before making the connection.
    An host of '' or port 0 tells the OS to use the default.
    """

    host, port = address
    if host.startswith("["):
        host = host.strip("[]")
    err = None

    # Using the value from allowed_gai_family() in the context of getaddrinfo lets
    # us select whether to work with IPv4 DNS records, IPv6 records, or both.
    # The original create_connection function always returns all records.
    family = allowed_gai_family()

    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = None
        try:
            sock = socket.socket(af, socktype, proto)

            # If provided, set socket level options before connecting.
            _set_socket_options(sock, socket_options)

            if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
                sock.settimeout(timeout)
            if source_address:
                sock.bind(source_address)
            sock.connect(sa)
            return sock

        except socket.error as e:
            err = e
            if sock is not None:
                sock.close()
                sock = None

    if err is not None:
      raise err

/usr/lib/python3/dist-packages/urllib3/util/connection.py:84:


address = ('127.0.0.1', 5000), timeout = 60, source_address = None socket_options = [(6, 1, 1)]

def create_connection(
    address,
    timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
    source_address=None,
    socket_options=None,
):
    """Connect to *address* and return the socket object.

    Convenience function.  Connect to *address* (a 2-tuple ``(host,
    port)``) and return the socket object.  Passing the optional
    *timeout* parameter will set the timeout on the socket instance
    before attempting to connect.  If no *timeout* is supplied, the
    global default timeout setting returned by :func:`getdefaulttimeout`
    is used.  If *source_address* is set it must be a tuple of (host, port)
    for the socket to bind as a source address before making the connection.
    An host of '' or port 0 tells the OS to use the default.
    """

    host, port = address
    if host.startswith("["):
        host = host.strip("[]")
    err = None

    # Using the value from allowed_gai_family() in the context of getaddrinfo lets
    # us select whether to work with IPv4 DNS records, IPv6 records, or both.
    # The original create_connection function always returns all records.
    family = allowed_gai_family()

    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = None
        try:
            sock = socket.socket(af, socktype, proto)

            # If provided, set socket level options before connecting.
            _set_socket_options(sock, socket_options)

            if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
                sock.settimeout(timeout)
            if source_address:
                sock.bind(source_address)
          sock.connect(sa)

E ConnectionRefusedError: [Errno 111] Connection refused

/usr/lib/python3/dist-packages/urllib3/util/connection.py:74: ConnectionRefusedError

During handling of the above exception, another exception occurred:

self = <botocore.httpsession.URLLib3Session object at 0x7fbad458f070> request = <AWSPreparedRequest stream_output=False, method=PUT, url=http://127.0.0.1:5000/csv, headers={'x-amz-acl': b'public-rea...nvocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}>

def send(self, request):
    try:
        proxy_url = self._proxy_config.proxy_url_for(request.url)
        manager = self._get_connection_manager(request.url, proxy_url)
        conn = manager.connection_from_url(request.url)
        self._setup_ssl_cert(conn, request.url, self._verify)
        if ensure_boolean(
            os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', '')
        ):
            # This is currently an "experimental" feature which provides
            # no guarantees of backwards compatibility. It may be subject
            # to change or removal in any patch version. Anyone opting in
            # to this feature should strictly pin botocore.
            host = urlparse(request.url).hostname
            conn.proxy_headers['host'] = host

        request_target = self._get_request_target(request.url, proxy_url)
      urllib_response = conn.urlopen(
            method=request.method,
            url=request_target,
            body=request.body,
            headers=request.headers,
            retries=Retry(False),
            assert_same_host=False,
            preload_content=False,
            decode_content=False,
            chunked=self._chunked(request.headers),
        )

/usr/local/lib/python3.8/dist-packages/botocore/httpsession.py:448:


self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0f6f3a60> method = 'PUT', url = '/csv', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'} retries = Retry(total=False, connect=None, read=None, redirect=0, status=None) redirect = True, assert_same_host = False timeout = <object object at 0x7fbbe1452220>, pool_timeout = None release_conn = False, chunked = False, body_pos = None response_kw = {'decode_content': False, 'preload_content': False}, conn = None release_this_conn = True, err = None, clean_exit = False timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbadb6f28e0> is_new_proxy_conn = False

def urlopen(
    self,
    method,
    url,
    body=None,
    headers=None,
    retries=None,
    redirect=True,
    assert_same_host=True,
    timeout=_Default,
    pool_timeout=None,
    release_conn=None,
    chunked=False,
    body_pos=None,
    **response_kw
):
    """
    Get a connection from the pool and perform an HTTP request. This is the
    lowest level call for making a request, so you'll need to specify all
    the raw details.

    .. note::

       More commonly, it's appropriate to use a convenience method provided
       by :class:`.RequestMethods`, such as :meth:`request`.

    .. note::

       `release_conn` will only behave as expected if
       `preload_content=False` because we want to make
       `preload_content=False` the default behaviour someday soon without
       breaking backwards compatibility.

    :param method:
        HTTP request method (such as GET, POST, PUT, etc.)

    :param body:
        Data to send in the request body (useful for creating
        POST requests, see HTTPConnectionPool.post_url for
        more convenience).

    :param headers:
        Dictionary of custom headers to send, such as User-Agent,
        If-None-Match, etc. If None, pool headers are used. If provided,
        these headers completely replace any pool-specific headers.

    :param retries:
        Configure the number of retries to allow before raising a
        :class:`~urllib3.exceptions.MaxRetryError` exception.

        Pass ``None`` to retry until you receive a response. Pass a
        :class:`~urllib3.util.retry.Retry` object for fine-grained control
        over different types of retries.
        Pass an integer number to retry connection errors that many times,
        but no other types of errors. Pass zero to never retry.

        If ``False``, then retries are disabled and any exception is raised
        immediately. Also, instead of raising a MaxRetryError on redirects,
        the redirect response will be returned.

    :type retries: :class:`~urllib3.util.retry.Retry`, False, or an int.

    :param redirect:
        If True, automatically handle redirects (status codes 301, 302,
        303, 307, 308). Each redirect counts as a retry. Disabling retries
        will disable redirect, too.

    :param assert_same_host:
        If ``True``, will make sure that the host of the pool requests is
        consistent else will raise HostChangedError. When False, you can
        use the pool on an HTTP proxy and request foreign hosts.

    :param timeout:
        If specified, overrides the default timeout for this one
        request. It may be a float (in seconds) or an instance of
        :class:`urllib3.util.Timeout`.

    :param pool_timeout:
        If set and the pool is set to block=True, then this method will
        block for ``pool_timeout`` seconds and raise EmptyPoolError if no
        connection is available within the time period.

    :param release_conn:
        If False, then the urlopen call will not release the connection
        back into the pool once a response is received (but will release if
        you read the entire contents of the response such as when
        `preload_content=True`). This is useful if you're not preloading
        the response's content immediately. You will need to call
        ``r.release_conn()`` on the response ``r`` to return the connection
        back into the pool. If None, it takes the value of
        ``response_kw.get('preload_content', True)``.

    :param chunked:
        If True, urllib3 will send the body using chunked transfer
        encoding. Otherwise, urllib3 will send the body using the standard
        content-length form. Defaults to False.

    :param int body_pos:
        Position to seek to in file-like body in the event of a retry or
        redirect. Typically this won't need to be set because urllib3 will
        auto-populate the value when needed.

    :param \\**response_kw:
        Additional parameters are passed to
        :meth:`urllib3.response.HTTPResponse.from_httplib`
    """
    if headers is None:
        headers = self.headers

    if not isinstance(retries, Retry):
        retries = Retry.from_int(retries, redirect=redirect, default=self.retries)

    if release_conn is None:
        release_conn = response_kw.get("preload_content", True)

    # Check host
    if assert_same_host and not self.is_same_host(url):
        raise HostChangedError(self, url, retries)

    # Ensure that the URL we're connecting to is properly encoded
    if url.startswith("/"):
        url = six.ensure_str(_encode_target(url))
    else:
        url = six.ensure_str(parse_url(url).url)

    conn = None

    # Track whether `conn` needs to be released before
    # returning/raising/recursing. Update this variable if necessary, and
    # leave `release_conn` constant throughout the function. That way, if
    # the function recurses, the original value of `release_conn` will be
    # passed down into the recursive call, and its value will be respected.
    #
    # See issue #651 [1] for details.
    #
    # [1] <https://github.com/urllib3/urllib3/issues/651>
    release_this_conn = release_conn

    # Merge the proxy headers. Only do this in HTTP. We have to copy the
    # headers dict so we can safely change it without those changes being
    # reflected in anyone else's copy.
    if self.scheme == "http":
        headers = headers.copy()
        headers.update(self.proxy_headers)

    # Must keep the exception bound to a separate variable or else Python 3
    # complains about UnboundLocalError.
    err = None

    # Keep track of whether we cleanly exited the except block. This
    # ensures we do proper cleanup in finally.
    clean_exit = False

    # Rewind body position, if needed. Record current position
    # for future rewinds in the event of a redirect/retry.
    body_pos = set_file_position(body, body_pos)

    try:
        # Request a connection from the queue.
        timeout_obj = self._get_timeout(timeout)
        conn = self._get_conn(timeout=pool_timeout)

        conn.timeout = timeout_obj.connect_timeout

        is_new_proxy_conn = self.proxy is not None and not getattr(
            conn, "sock", None
        )
        if is_new_proxy_conn:
            self._prepare_proxy(conn)

        # Make the request on the httplib connection object.
        httplib_response = self._make_request(
            conn,
            method,
            url,
            timeout=timeout_obj,
            body=body,
            headers=headers,
            chunked=chunked,
        )

        # If we're going to release the connection in ``finally:``, then
        # the response doesn't need to know about the connection. Otherwise
        # it will also try to release it and we'll have a double-release
        # mess.
        response_conn = conn if not release_conn else None

        # Pass method to Response for length checking
        response_kw["request_method"] = method

        # Import httplib's response into our own wrapper object
        response = self.ResponseCls.from_httplib(
            httplib_response,
            pool=self,
            connection=response_conn,
            retries=retries,
            **response_kw
        )

        # Everything went great!
        clean_exit = True

    except queue.Empty:
        # Timed out by queue.
        raise EmptyPoolError(self, "No pool connections are available.")

    except (
        TimeoutError,
        HTTPException,
        SocketError,
        ProtocolError,
        BaseSSLError,
        SSLError,
        CertificateError,
    ) as e:
        # Discard the connection for these exceptions. It will be
        # replaced during the next _get_conn() call.
        clean_exit = False
        if isinstance(e, (BaseSSLError, CertificateError)):
            e = SSLError(e)
        elif isinstance(e, (SocketError, NewConnectionError)) and self.proxy:
            e = ProxyError("Cannot connect to proxy.", e)
        elif isinstance(e, (SocketError, HTTPException)):
            e = ProtocolError("Connection aborted.", e)
      retries = retries.increment(
            method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
        )

/usr/lib/python3/dist-packages/urllib3/connectionpool.py:719:


self = Retry(total=False, connect=None, read=None, redirect=0, status=None) method = 'PUT', url = '/csv', response = None error = NewConnectionError('<botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>: Failed to establish a new connection: [Errno 111] Connection refused') _pool = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0f6f3a60> _stacktrace = <traceback object at 0x7fbad70ed800>

def increment(
    self,
    method=None,
    url=None,
    response=None,
    error=None,
    _pool=None,
    _stacktrace=None,
):
    """ Return a new Retry object with incremented retry counters.

    :param response: A response object, or None, if the server did not
        return a response.
    :type response: :class:`~urllib3.response.HTTPResponse`
    :param Exception error: An error encountered during the request, or
        None if the response was received successfully.

    :return: A new ``Retry`` object.
    """
    if self.total is False and error:
        # Disabled, indicate to re-raise the error.
      raise six.reraise(type(error), error, _stacktrace)

/usr/lib/python3/dist-packages/urllib3/util/retry.py:376:


tp = <class 'urllib3.exceptions.NewConnectionError'>, value = None, tb = None

def reraise(tp, value, tb=None):
    try:
        if value is None:
            value = tp()
        if value.__traceback__ is not tb:
            raise value.with_traceback(tb)
      raise value

../../../.local/lib/python3.8/site-packages/six.py:703:


self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0f6f3a60> method = 'PUT', url = '/csv', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'} retries = Retry(total=False, connect=None, read=None, redirect=0, status=None) redirect = True, assert_same_host = False timeout = <object object at 0x7fbbe1452220>, pool_timeout = None release_conn = False, chunked = False, body_pos = None response_kw = {'decode_content': False, 'preload_content': False}, conn = None release_this_conn = True, err = None, clean_exit = False timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbadb6f28e0> is_new_proxy_conn = False

def urlopen(
    self,
    method,
    url,
    body=None,
    headers=None,
    retries=None,
    redirect=True,
    assert_same_host=True,
    timeout=_Default,
    pool_timeout=None,
    release_conn=None,
    chunked=False,
    body_pos=None,
    **response_kw
):
    """
    Get a connection from the pool and perform an HTTP request. This is the
    lowest level call for making a request, so you'll need to specify all
    the raw details.

    .. note::

       More commonly, it's appropriate to use a convenience method provided
       by :class:`.RequestMethods`, such as :meth:`request`.

    .. note::

       `release_conn` will only behave as expected if
       `preload_content=False` because we want to make
       `preload_content=False` the default behaviour someday soon without
       breaking backwards compatibility.

    :param method:
        HTTP request method (such as GET, POST, PUT, etc.)

    :param body:
        Data to send in the request body (useful for creating
        POST requests, see HTTPConnectionPool.post_url for
        more convenience).

    :param headers:
        Dictionary of custom headers to send, such as User-Agent,
        If-None-Match, etc. If None, pool headers are used. If provided,
        these headers completely replace any pool-specific headers.

    :param retries:
        Configure the number of retries to allow before raising a
        :class:`~urllib3.exceptions.MaxRetryError` exception.

        Pass ``None`` to retry until you receive a response. Pass a
        :class:`~urllib3.util.retry.Retry` object for fine-grained control
        over different types of retries.
        Pass an integer number to retry connection errors that many times,
        but no other types of errors. Pass zero to never retry.

        If ``False``, then retries are disabled and any exception is raised
        immediately. Also, instead of raising a MaxRetryError on redirects,
        the redirect response will be returned.

    :type retries: :class:`~urllib3.util.retry.Retry`, False, or an int.

    :param redirect:
        If True, automatically handle redirects (status codes 301, 302,
        303, 307, 308). Each redirect counts as a retry. Disabling retries
        will disable redirect, too.

    :param assert_same_host:
        If ``True``, will make sure that the host of the pool requests is
        consistent else will raise HostChangedError. When False, you can
        use the pool on an HTTP proxy and request foreign hosts.

    :param timeout:
        If specified, overrides the default timeout for this one
        request. It may be a float (in seconds) or an instance of
        :class:`urllib3.util.Timeout`.

    :param pool_timeout:
        If set and the pool is set to block=True, then this method will
        block for ``pool_timeout`` seconds and raise EmptyPoolError if no
        connection is available within the time period.

    :param release_conn:
        If False, then the urlopen call will not release the connection
        back into the pool once a response is received (but will release if
        you read the entire contents of the response such as when
        `preload_content=True`). This is useful if you're not preloading
        the response's content immediately. You will need to call
        ``r.release_conn()`` on the response ``r`` to return the connection
        back into the pool. If None, it takes the value of
        ``response_kw.get('preload_content', True)``.

    :param chunked:
        If True, urllib3 will send the body using chunked transfer
        encoding. Otherwise, urllib3 will send the body using the standard
        content-length form. Defaults to False.

    :param int body_pos:
        Position to seek to in file-like body in the event of a retry or
        redirect. Typically this won't need to be set because urllib3 will
        auto-populate the value when needed.

    :param \\**response_kw:
        Additional parameters are passed to
        :meth:`urllib3.response.HTTPResponse.from_httplib`
    """
    if headers is None:
        headers = self.headers

    if not isinstance(retries, Retry):
        retries = Retry.from_int(retries, redirect=redirect, default=self.retries)

    if release_conn is None:
        release_conn = response_kw.get("preload_content", True)

    # Check host
    if assert_same_host and not self.is_same_host(url):
        raise HostChangedError(self, url, retries)

    # Ensure that the URL we're connecting to is properly encoded
    if url.startswith("/"):
        url = six.ensure_str(_encode_target(url))
    else:
        url = six.ensure_str(parse_url(url).url)

    conn = None

    # Track whether `conn` needs to be released before
    # returning/raising/recursing. Update this variable if necessary, and
    # leave `release_conn` constant throughout the function. That way, if
    # the function recurses, the original value of `release_conn` will be
    # passed down into the recursive call, and its value will be respected.
    #
    # See issue #651 [1] for details.
    #
    # [1] <https://github.com/urllib3/urllib3/issues/651>
    release_this_conn = release_conn

    # Merge the proxy headers. Only do this in HTTP. We have to copy the
    # headers dict so we can safely change it without those changes being
    # reflected in anyone else's copy.
    if self.scheme == "http":
        headers = headers.copy()
        headers.update(self.proxy_headers)

    # Must keep the exception bound to a separate variable or else Python 3
    # complains about UnboundLocalError.
    err = None

    # Keep track of whether we cleanly exited the except block. This
    # ensures we do proper cleanup in finally.
    clean_exit = False

    # Rewind body position, if needed. Record current position
    # for future rewinds in the event of a redirect/retry.
    body_pos = set_file_position(body, body_pos)

    try:
        # Request a connection from the queue.
        timeout_obj = self._get_timeout(timeout)
        conn = self._get_conn(timeout=pool_timeout)

        conn.timeout = timeout_obj.connect_timeout

        is_new_proxy_conn = self.proxy is not None and not getattr(
            conn, "sock", None
        )
        if is_new_proxy_conn:
            self._prepare_proxy(conn)

        # Make the request on the httplib connection object.
      httplib_response = self._make_request(
            conn,
            method,
            url,
            timeout=timeout_obj,
            body=body,
            headers=headers,
            chunked=chunked,
        )

/usr/lib/python3/dist-packages/urllib3/connectionpool.py:665:


self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0f6f3a60> conn = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580> method = 'PUT', url = '/csv' timeout = <urllib3.util.timeout.Timeout object at 0x7fbadb6f28e0> chunked = False httplib_request_kw = {'body': None, 'headers': {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-...nvocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}} timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbad68bdcd0>

def _make_request(
    self, conn, method, url, timeout=_Default, chunked=False, **httplib_request_kw
):
    """
    Perform a request on a given urllib connection object taken from our
    pool.

    :param conn:
        a connection from one of our connection pools

    :param timeout:
        Socket timeout in seconds for the request. This can be a
        float or integer, which will set the same timeout value for
        the socket connect and the socket read, or an instance of
        :class:`urllib3.util.Timeout`, which gives you more fine-grained
        control over your timeouts.
    """
    self.num_requests += 1

    timeout_obj = self._get_timeout(timeout)
    timeout_obj.start_connect()
    conn.timeout = timeout_obj.connect_timeout

    # Trigger any extra validation we need to do.
    try:
        self._validate_conn(conn)
    except (SocketTimeout, BaseSSLError) as e:
        # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
        self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
        raise

    # conn.request() calls httplib.*.request, not the method in
    # urllib3.request. It also calls makefile (recv) on the socket.
    if chunked:
        conn.request_chunked(method, url, **httplib_request_kw)
    else:
      conn.request(method, url, **httplib_request_kw)

/usr/lib/python3/dist-packages/urllib3/connectionpool.py:387:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580> method = 'PUT', url = '/csv', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}

def request(self, method, url, body=None, headers={}, *,
            encode_chunked=False):
    """Send a complete request to the server."""
  self._send_request(method, url, body, headers, encode_chunked)

/usr/lib/python3.8/http/client.py:1256:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580> method = 'PUT', url = '/csv', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'} args = (False,), kwargs = {}

def _send_request(self, method, url, body, headers, *args, **kwargs):
    self._response_received = False
    if headers.get('Expect', b'') == b'100-continue':
        self._expect_header_set = True
    else:
        self._expect_header_set = False
        self.response_class = self._original_response_cls
  rval = super()._send_request(
        method, url, body, headers, *args, **kwargs
    )

/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:94:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580> method = 'PUT', url = '/csv', body = None headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'} encode_chunked = False

def _send_request(self, method, url, body, headers, encode_chunked):
    # Honor explicitly requested Host: and Accept-Encoding: headers.
    header_names = frozenset(k.lower() for k in headers)
    skips = {}
    if 'host' in header_names:
        skips['skip_host'] = 1
    if 'accept-encoding' in header_names:
        skips['skip_accept_encoding'] = 1

    self.putrequest(method, url, **skips)

    # chunked encoding will happen if HTTP/1.1 is used and either
    # the caller passes encode_chunked=True or the following
    # conditions hold:
    # 1. content-length has not been explicitly set
    # 2. the body is a file or iterable, but not a str or bytes-like
    # 3. Transfer-Encoding has NOT been explicitly set by the caller

    if 'content-length' not in header_names:
        # only chunk body if not explicitly set for backwards
        # compatibility, assuming the client code is already handling the
        # chunking
        if 'transfer-encoding' not in header_names:
            # if content-length cannot be automatically determined, fall
            # back to chunked encoding
            encode_chunked = False
            content_length = self._get_content_length(body, method)
            if content_length is None:
                if body is not None:
                    if self.debuglevel > 0:
                        print('Unable to determine size of %r' % body)
                    encode_chunked = True
                    self.putheader('Transfer-Encoding', 'chunked')
            else:
                self.putheader('Content-Length', str(content_length))
    else:
        encode_chunked = False

    for hdr, value in headers.items():
        self.putheader(hdr, value)
    if isinstance(body, str):
        # RFC 2616 Section 3.7.1 says that text default has a
        # default charset of iso-8859-1.
        body = _encode(body, 'body')
  self.endheaders(body, encode_chunked=encode_chunked)

/usr/lib/python3.8/http/client.py:1302:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580> message_body = None

def endheaders(self, message_body=None, *, encode_chunked=False):
    """Indicate that the last header line has been sent to the server.

    This method sends the request to the server.  The optional message_body
    argument can be used to pass a message body associated with the
    request.
    """
    if self.__state == _CS_REQ_STARTED:
        self.__state = _CS_REQ_SENT
    else:
        raise CannotSendHeader()
  self._send_output(message_body, encode_chunked=encode_chunked)

/usr/lib/python3.8/http/client.py:1251:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580> message_body = None, args = (), kwargs = {'encode_chunked': False} msg = b'PUT /csv HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-Agent...-invocation-id: 14612633-35f0-4489-9ff4-5f34e64a6dcb\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'

def _send_output(self, message_body=None, *args, **kwargs):
    self._buffer.extend((b"", b""))
    msg = self._convert_to_bytes(self._buffer)
    del self._buffer[:]
    # If msg and message_body are sent in a single send() call,
    # it will avoid performance problems caused by the interaction
    # between delayed ack and the Nagle algorithm.
    if isinstance(message_body, bytes):
        msg += message_body
        message_body = None
  self.send(msg)

/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:123:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580> str = b'PUT /csv HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-Agent...-invocation-id: 14612633-35f0-4489-9ff4-5f34e64a6dcb\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'

def send(self, str):
    if self._response_received:
        logger.debug(
            "send() called, but reseponse already received. "
            "Not sending data."
        )
        return
  return super().send(str)

/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:218:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580> data = b'PUT /csv HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-Agent...-invocation-id: 14612633-35f0-4489-9ff4-5f34e64a6dcb\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'

def send(self, data):
    """Send `data' to the server.
    ``data`` can be a string object, a bytes object, an array object, a
    file-like object that supports a .read() method, or an iterable object.
    """

    if self.sock is None:
        if self.auto_open:
          self.connect()

/usr/lib/python3.8/http/client.py:951:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>

def connect(self):
  conn = self._new_conn()

/usr/lib/python3/dist-packages/urllib3/connection.py:187:


self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>

def _new_conn(self):
    """ Establish a socket connection and set nodelay settings on it.

    :return: New socket connection.
    """
    extra_kw = {}
    if self.source_address:
        extra_kw["source_address"] = self.source_address

    if self.socket_options:
        extra_kw["socket_options"] = self.socket_options

    try:
        conn = connection.create_connection(
            (self._dns_host, self.port), self.timeout, **extra_kw
        )

    except SocketTimeout:
        raise ConnectTimeoutError(
            self,
            "Connection to %s timed out. (connect timeout=%s)"
            % (self.host, self.timeout),
        )

    except SocketError as e:
      raise NewConnectionError(
            self, "Failed to establish a new connection: %s" % e
        )

E urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>: Failed to establish a new connection: [Errno 111] Connection refused

/usr/lib/python3/dist-packages/urllib3/connection.py:171: NewConnectionError

During handling of the above exception, another exception occurred:

s3_base = 'http://127.0.0.1:5000/' s3so = {'client_kwargs': {'endpoint_url': 'http://127.0.0.1:5000/'}} paths = ['/tmp/pytest-of-jenkins/pytest-14/csv0/dataset-0.csv', '/tmp/pytest-of-jenkins/pytest-14/csv0/dataset-1.csv'] datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')} engine = 'csv' df = name-string id label x y 0 Laura 1054 964 -0.792165 0.069362 1 Laura ... Zelda 976 964 -0.270133 0.839677 2160 Ray 967 977 0.033737 -0.727091

[4321 rows x 5 columns] patch_aiobotocore = None

@pytest.mark.parametrize("engine", ["parquet", "csv"])
def test_s3_dataset(s3_base, s3so, paths, datasets, engine, df, patch_aiobotocore):
    # Copy files to mock s3 bucket
    files = {}
    for i, path in enumerate(paths):
        with open(path, "rb") as f:
            fbytes = f.read()
        fn = path.split(os.path.sep)[-1]
        files[fn] = BytesIO()
        files[fn].write(fbytes)
        files[fn].seek(0)

    if engine == "parquet":
        # Workaround for nvt#539. In order to avoid the
        # bug in Dask's `create_metadata_file`, we need
        # to manually generate a "_metadata" file here.
        # This can be removed after dask#7295 is merged
        # (see https://github.com/dask/dask/pull/7295)
        fn = "_metadata"
        files[fn] = BytesIO()
        meta = create_metadata_file(
            paths,
            engine="pyarrow",
            out_dir=False,
        )
        meta.write_metadata_file(files[fn])
        files[fn].seek(0)
  with s3_context(s3_base=s3_base, bucket=engine, files=files) as s3fs:

tests/unit/test_s3.py:97:


/usr/lib/python3.8/contextlib.py:113: in enter return next(self.gen) /usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:96: in s3_context client.create_bucket(Bucket=bucket, ACL="public-read-write") /usr/local/lib/python3.8/dist-packages/botocore/client.py:508: in _api_call return self._make_api_call(operation_name, kwargs) /usr/local/lib/python3.8/dist-packages/botocore/client.py:898: in _make_api_call http, parsed_response = self._make_request( /usr/local/lib/python3.8/dist-packages/botocore/client.py:921: in _make_request return self._endpoint.make_request(operation_model, request_dict) /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:119: in make_request return self._send_request(request_dict, operation_model) /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:202: in _send_request while self._needs_retry( /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:354: in _needs_retry responses = self._event_emitter.emit( /usr/local/lib/python3.8/dist-packages/botocore/hooks.py:412: in emit return self._emitter.emit(aliased_event_name, **kwargs) /usr/local/lib/python3.8/dist-packages/botocore/hooks.py:256: in emit return self._emit(event_name, kwargs) /usr/local/lib/python3.8/dist-packages/botocore/hooks.py:239: in _emit response = handler(**kwargs) /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:207: in call if self._checker(**checker_kwargs): /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:284: in call should_retry = self._should_retry( /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:320: in _should_retry return self._checker(attempt_number, response, caught_exception) /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:363: in call checker_response = checker( /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:247: in call return self._check_caught_exception( /usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:416: in _check_caught_exception raise caught_exception /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:281: in _do_get_response http_response = self._send(request) /usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:377: in _send return self.http_session.send(request)


self = <botocore.httpsession.URLLib3Session object at 0x7fbad458f070> request = <AWSPreparedRequest stream_output=False, method=PUT, url=http://127.0.0.1:5000/csv, headers={'x-amz-acl': b'public-rea...nvocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}>

def send(self, request):
    try:
        proxy_url = self._proxy_config.proxy_url_for(request.url)
        manager = self._get_connection_manager(request.url, proxy_url)
        conn = manager.connection_from_url(request.url)
        self._setup_ssl_cert(conn, request.url, self._verify)
        if ensure_boolean(
            os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', '')
        ):
            # This is currently an "experimental" feature which provides
            # no guarantees of backwards compatibility. It may be subject
            # to change or removal in any patch version. Anyone opting in
            # to this feature should strictly pin botocore.
            host = urlparse(request.url).hostname
            conn.proxy_headers['host'] = host

        request_target = self._get_request_target(request.url, proxy_url)
        urllib_response = conn.urlopen(
            method=request.method,
            url=request_target,
            body=request.body,
            headers=request.headers,
            retries=Retry(False),
            assert_same_host=False,
            preload_content=False,
            decode_content=False,
            chunked=self._chunked(request.headers),
        )

        http_response = botocore.awsrequest.AWSResponse(
            request.url,
            urllib_response.status,
            urllib_response.headers,
            urllib_response,
        )

        if not request.stream_output:
            # Cause the raw stream to be exhausted immediately. We do it
            # this way instead of using preload_content because
            # preload_content will never buffer chunked responses
            http_response.content

        return http_response
    except URLLib3SSLError as e:
        raise SSLError(endpoint_url=request.url, error=e)
    except (NewConnectionError, socket.gaierror) as e:
      raise EndpointConnectionError(endpoint_url=request.url, error=e)

E botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5000/csv"

/usr/local/lib/python3.8/dist-packages/botocore/httpsession.py:477: EndpointConnectionError _____________________ test_cpu_workflow[True-True-parquet] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_pa0') df = name-cat name-string id label x y 0 Edith Laura 1054 964 -0.792165 0.069362 ...da 976 964 -0.270133 0.839677 4320 Alice Ray 967 977 0.033737 -0.727091

[4321 rows x 6 columns] dataset = <merlin.io.dataset.Dataset object at 0x7fba407cecd0>, cpu = True engine = 'parquet', dump = True

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
    # Make sure we are in cpu formats
    if cudf and isinstance(df, cudf.DataFrame):
        df = df.to_pandas()

    if cpu:
        dataset.to_cpu()

    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    norms = ops.Normalize()
    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
    cats = cat_names >> ops.Categorify()
    workflow = nvt.Workflow(conts + cats + label_name)

    workflow.fit(dataset)
    if dump:
        workflow_dir = os.path.join(tmpdir, "workflow")
        workflow.save(workflow_dir)
        workflow = None

        workflow = Workflow.load(workflow_dir)

    def get_norms(tar: pd.Series):
        df = tar.fillna(0)
        df = df * (df >= 0).astype("int")
        return df

    assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
    assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)

    # Check that categories match
    if engine == "parquet":
        cats_expected0 = df["name-cat"].unique()
        cats0 = get_cats(workflow, "name-cat", cpu=True)
        # adding the None entry as a string because of move from gpu
        assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
        assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
    cats_expected1 = df["name-string"].unique()
    cats1 = get_cats(workflow, "name-string", cpu=True)
    # adding the None entry as a string because of move from gpu
    assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
    assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])

    # Write to new "shuffled" and "processed" dataset
    workflow.transform(dataset).to_parquet(
        output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
    )
  dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)

tests/unit/workflow/test_cpu_workflow.py:76:


/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init self.engine = ParquetDatasetEngine( /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init self._path0, /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0 return next(self._dataset.get_fragments()).path /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset dataset = pa_ds.dataset(paths, filesystem=fs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset return factory.finish(schema) pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ???


??? E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_pa0/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_pa0/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid _______________________ test_cpu_workflow[True-True-csv] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs0') df = name-string id label x y 0 Laura 1054 964 -0.792165 0.069362 1 Laura ... Zelda 976 964 -0.270133 0.839677 2160 Ray 967 977 0.033737 -0.727091

[4321 rows x 5 columns] dataset = <merlin.io.dataset.Dataset object at 0x7fba406e8610>, cpu = True engine = 'csv', dump = True

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
    # Make sure we are in cpu formats
    if cudf and isinstance(df, cudf.DataFrame):
        df = df.to_pandas()

    if cpu:
        dataset.to_cpu()

    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    norms = ops.Normalize()
    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
    cats = cat_names >> ops.Categorify()
    workflow = nvt.Workflow(conts + cats + label_name)

    workflow.fit(dataset)
    if dump:
        workflow_dir = os.path.join(tmpdir, "workflow")
        workflow.save(workflow_dir)
        workflow = None

        workflow = Workflow.load(workflow_dir)

    def get_norms(tar: pd.Series):
        df = tar.fillna(0)
        df = df * (df >= 0).astype("int")
        return df

    assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
    assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)

    # Check that categories match
    if engine == "parquet":
        cats_expected0 = df["name-cat"].unique()
        cats0 = get_cats(workflow, "name-cat", cpu=True)
        # adding the None entry as a string because of move from gpu
        assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
        assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
    cats_expected1 = df["name-string"].unique()
    cats1 = get_cats(workflow, "name-string", cpu=True)
    # adding the None entry as a string because of move from gpu
    assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
    assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])

    # Write to new "shuffled" and "processed" dataset
    workflow.transform(dataset).to_parquet(
        output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
    )
  dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)

tests/unit/workflow/test_cpu_workflow.py:76:


/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init self.engine = ParquetDatasetEngine( /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init self._path0, /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0 return next(self._dataset.get_fragments()).path /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset dataset = pa_ds.dataset(paths, filesystem=fs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset return factory.finish(schema) pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ???


??? E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs0/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs0/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid __________________ test_cpu_workflow[True-True-csv-no-header] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs1') df = name-string id label x y 0 Laura 1054 964 -0.792165 0.069362 1 Laura ... Zelda 976 964 -0.270133 0.839677 2160 Ray 967 977 0.033737 -0.727091

[4321 rows x 5 columns] dataset = <merlin.io.dataset.Dataset object at 0x7fba406e6250>, cpu = True engine = 'csv-no-header', dump = True

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
    # Make sure we are in cpu formats
    if cudf and isinstance(df, cudf.DataFrame):
        df = df.to_pandas()

    if cpu:
        dataset.to_cpu()

    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    norms = ops.Normalize()
    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
    cats = cat_names >> ops.Categorify()
    workflow = nvt.Workflow(conts + cats + label_name)

    workflow.fit(dataset)
    if dump:
        workflow_dir = os.path.join(tmpdir, "workflow")
        workflow.save(workflow_dir)
        workflow = None

        workflow = Workflow.load(workflow_dir)

    def get_norms(tar: pd.Series):
        df = tar.fillna(0)
        df = df * (df >= 0).astype("int")
        return df

    assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
    assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)

    # Check that categories match
    if engine == "parquet":
        cats_expected0 = df["name-cat"].unique()
        cats0 = get_cats(workflow, "name-cat", cpu=True)
        # adding the None entry as a string because of move from gpu
        assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
        assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
    cats_expected1 = df["name-string"].unique()
    cats1 = get_cats(workflow, "name-string", cpu=True)
    # adding the None entry as a string because of move from gpu
    assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
    assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])

    # Write to new "shuffled" and "processed" dataset
    workflow.transform(dataset).to_parquet(
        output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
    )
  dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)

tests/unit/workflow/test_cpu_workflow.py:76:


/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init self.engine = ParquetDatasetEngine( /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init self._path0, /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0 return next(self._dataset.get_fragments()).path /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset dataset = pa_ds.dataset(paths, filesystem=fs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset return factory.finish(schema) pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ???


??? E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs1/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs1/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid ____________________ test_cpu_workflow[True-False-parquet] _____________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_p0') df = name-cat name-string id label x y 0 Edith Laura 1054 964 -0.792165 0.069362 ...da 976 964 -0.270133 0.839677 4320 Alice Ray 967 977 0.033737 -0.727091

[4321 rows x 6 columns] dataset = <merlin.io.dataset.Dataset object at 0x7fba285b28b0>, cpu = True engine = 'parquet', dump = False

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
    # Make sure we are in cpu formats
    if cudf and isinstance(df, cudf.DataFrame):
        df = df.to_pandas()

    if cpu:
        dataset.to_cpu()

    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    norms = ops.Normalize()
    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
    cats = cat_names >> ops.Categorify()
    workflow = nvt.Workflow(conts + cats + label_name)

    workflow.fit(dataset)
    if dump:
        workflow_dir = os.path.join(tmpdir, "workflow")
        workflow.save(workflow_dir)
        workflow = None

        workflow = Workflow.load(workflow_dir)

    def get_norms(tar: pd.Series):
        df = tar.fillna(0)
        df = df * (df >= 0).astype("int")
        return df

    assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
    assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)

    # Check that categories match
    if engine == "parquet":
        cats_expected0 = df["name-cat"].unique()
        cats0 = get_cats(workflow, "name-cat", cpu=True)
        # adding the None entry as a string because of move from gpu
        assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
        assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
    cats_expected1 = df["name-string"].unique()
    cats1 = get_cats(workflow, "name-string", cpu=True)
    # adding the None entry as a string because of move from gpu
    assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
    assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])

    # Write to new "shuffled" and "processed" dataset
    workflow.transform(dataset).to_parquet(
        output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
    )
  dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)

tests/unit/workflow/test_cpu_workflow.py:76:


/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init self.engine = ParquetDatasetEngine( /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init self._path0, /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0 return next(self._dataset.get_fragments()).path /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset dataset = pa_ds.dataset(paths, filesystem=fs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset return factory.finish(schema) pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ???


??? E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_p0/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_p0/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid ______________________ test_cpu_workflow[True-False-csv] _______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c0') df = name-string id label x y 0 Laura 1054 964 -0.792165 0.069362 1 Laura ... Zelda 976 964 -0.270133 0.839677 2160 Ray 967 977 0.033737 -0.727091

[4321 rows x 5 columns] dataset = <merlin.io.dataset.Dataset object at 0x7fba40730370>, cpu = True engine = 'csv', dump = False

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
    # Make sure we are in cpu formats
    if cudf and isinstance(df, cudf.DataFrame):
        df = df.to_pandas()

    if cpu:
        dataset.to_cpu()

    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    norms = ops.Normalize()
    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
    cats = cat_names >> ops.Categorify()
    workflow = nvt.Workflow(conts + cats + label_name)

    workflow.fit(dataset)
    if dump:
        workflow_dir = os.path.join(tmpdir, "workflow")
        workflow.save(workflow_dir)
        workflow = None

        workflow = Workflow.load(workflow_dir)

    def get_norms(tar: pd.Series):
        df = tar.fillna(0)
        df = df * (df >= 0).astype("int")
        return df

    assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
    assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)

    # Check that categories match
    if engine == "parquet":
        cats_expected0 = df["name-cat"].unique()
        cats0 = get_cats(workflow, "name-cat", cpu=True)
        # adding the None entry as a string because of move from gpu
        assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
        assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
    cats_expected1 = df["name-string"].unique()
    cats1 = get_cats(workflow, "name-string", cpu=True)
    # adding the None entry as a string because of move from gpu
    assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
    assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])

    # Write to new "shuffled" and "processed" dataset
    workflow.transform(dataset).to_parquet(
        output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
    )
  dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)

tests/unit/workflow/test_cpu_workflow.py:76:


/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init self.engine = ParquetDatasetEngine( /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init self._path0, /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0 return next(self._dataset.get_fragments()).path /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset dataset = pa_ds.dataset(paths, filesystem=fs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset return factory.finish(schema) pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ???


??? E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c0/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c0/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid _________________ test_cpu_workflow[True-False-csv-no-header] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c1') df = name-string id label x y 0 Laura 1054 964 -0.792165 0.069362 1 Laura ... Zelda 976 964 -0.270133 0.839677 2160 Ray 967 977 0.033737 -0.727091

[4321 rows x 5 columns] dataset = <merlin.io.dataset.Dataset object at 0x7fba586e2eb0>, cpu = True engine = 'csv-no-header', dump = False

@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
    # Make sure we are in cpu formats
    if cudf and isinstance(df, cudf.DataFrame):
        df = df.to_pandas()

    if cpu:
        dataset.to_cpu()

    cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
    cont_names = ["x", "y", "id"]
    label_name = ["label"]

    norms = ops.Normalize()
    conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
    cats = cat_names >> ops.Categorify()
    workflow = nvt.Workflow(conts + cats + label_name)

    workflow.fit(dataset)
    if dump:
        workflow_dir = os.path.join(tmpdir, "workflow")
        workflow.save(workflow_dir)
        workflow = None

        workflow = Workflow.load(workflow_dir)

    def get_norms(tar: pd.Series):
        df = tar.fillna(0)
        df = df * (df >= 0).astype("int")
        return df

    assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
    assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
    assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)

    # Check that categories match
    if engine == "parquet":
        cats_expected0 = df["name-cat"].unique()
        cats0 = get_cats(workflow, "name-cat", cpu=True)
        # adding the None entry as a string because of move from gpu
        assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
        assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
    cats_expected1 = df["name-string"].unique()
    cats1 = get_cats(workflow, "name-string", cpu=True)
    # adding the None entry as a string because of move from gpu
    assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
    assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])

    # Write to new "shuffled" and "processed" dataset
    workflow.transform(dataset).to_parquet(
        output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
    )
  dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)

tests/unit/workflow/test_cpu_workflow.py:76:


/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init self.engine = ParquetDatasetEngine( /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init self._path0, /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0 return next(self._dataset.get_fragments()).path /usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset dataset = pa_ds.dataset(paths, filesystem=fs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset return _filesystem_dataset(source, **kwargs) /usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset return factory.finish(schema) pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish ??? pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status ???


??? E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c1/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c1/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid =============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)

nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader. warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning tests/unit/test_tf4rec.py: 1 warning tests/unit/test_tools.py: 5 warnings tests/unit/test_triton_inference.py: 8 warnings tests/unit/loader/test_dataloader_backend.py: 6 warnings tests/unit/loader/test_tf_dataloader.py: 66 warnings tests/unit/loader/test_torch_dataloader.py: 67 warnings tests/unit/ops/test_categorify.py: 69 warnings tests/unit/ops/test_drop_low_cardinality.py: 2 warnings tests/unit/ops/test_fill.py: 8 warnings tests/unit/ops/test_hash_bucket.py: 4 warnings tests/unit/ops/test_join.py: 88 warnings tests/unit/ops/test_lambda.py: 1 warning tests/unit/ops/test_normalize.py: 9 warnings tests/unit/ops/test_ops.py: 11 warnings tests/unit/ops/test_ops_schema.py: 17 warnings tests/unit/workflow/test_workflow.py: 27 warnings tests/unit/workflow/test_workflow_chaining.py: 1 warning tests/unit/workflow/test_workflow_node.py: 1 warning tests/unit/workflow/test_workflow_schemas.py: 1 warning /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility. warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /usr/local/lib/python3.8/dist-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(

tests/unit/test_notebooks.py: 1 warning tests/unit/test_tools.py: 17 warnings tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 54 warnings /usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html =========================== short test summary info ============================ FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-True-device-0-csv-no-header-0.1] FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-True-device-150-csv-no-header-0.1] FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-False-None-0-csv-0.1] FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-False-None-0-csv-no-header-0.1] FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-False-None-150-csv-0.1] FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-False-None-150-csv-no-header-0.1] FAILED tests/unit/test_dask_nvt.py::test_dask_preproc_cpu[True-None-parquet] FAILED tests/unit/test_dask_nvt.py::test_dask_preproc_cpu[True-None-csv] - py... FAILED tests/unit/test_dask_nvt.py::test_dask_preproc_cpu[True-None-csv-no-header] FAILED tests/unit/test_s3.py::test_s3_dataset[parquet] - botocore.exceptions.... FAILED tests/unit/test_s3.py::test_s3_dataset[csv] - botocore.exceptions.Endp... FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-True-parquet] FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-True-csv] FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-True-csv-no-header] FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-False-parquet] FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-False-csv] FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-False-csv-no-header] ===== 17 failed, 1414 passed, 1 skipped, 617 warnings in 713.17s (0:11:53) ===== Build step 'Execute shell' marked build as failure Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins9505466125844008201.sh

nvidia-merlin-bot avatar Aug 09 '22 08:08 nvidia-merlin-bot

Click to view CI Results
GitHub pull request #1547 of commit 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd, no merge conflicts.
Running as SYSTEM
Setting status of 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4737/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
 > git rev-parse 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd^{commit} # timeout=10
Checking out Revision 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk 4fe1280dd723e58fa32bc5579eadce7148e1d42a # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3615814383666057739.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+10.g2e73d5bc5.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.85,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.84,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@2e73d5bc5decc20505ee9d9e78990689b8e8c2dd#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.3,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='692881346'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-r5jilq00
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-r5jilq00
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 98cd36067d5ad9bb952aa2dbfac55eb059bb7bc4
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: betterproto=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (21.3)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (7.0.0)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (1.3.5)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (4.64.1)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (1.10.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (3.19.5)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.5.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (0.55.1)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+1.g98cd360) (0.4.3)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.12.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.4.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (3.1.2)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (6.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.4.0)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.0.0)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.4)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (8.1.3)
Requirement already satisfied: llvmlite=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (0.38.1)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (65.3.0)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+1.g98cd360) (3.0.9)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (2.8.2)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.52.0)
Requirement already satisfied: absl-py=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (6.0.2)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.1.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (4.0.0)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (6.0.1)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.7.0+1.g98cd360-py3-none-any.whl size=114014 sha256=df12df6ac1e572406abd8c885387a4211ea2c127768fde979586bc8fa70a4c12
  Stored in directory: /tmp/pip-ephem-wheel-cache-dxtipq6p/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
  Attempting uninstall: merlin-core
    Found existing installation: merlin-core 0.3.0+12.g78ecddd
    Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
    Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.7.0+1.g98cd360
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1433 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] .... [ 8%] tests/unit/test_notebooks.py .... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/examples/test_01-Getting-started.py . [ 12%] tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%] tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ............................................. [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 45%] tests/unit/ops/test_groupyby.py ..................... [ 47%] tests/unit/ops/test_hash_bucket.py ......................... [ 49%] tests/unit/ops/test_join.py ............................................ [ 52%] ........................................................................ [ 57%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 63%] .. [ 63%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] .......................................................... [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]

=============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)

.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)

nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader. warnings.warn(

tests/unit/test_dask_nvt.py: 6 warnings tests/unit/workflow/test_workflow.py: 78 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results. warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops_schema.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>]. warnings.warn(

tests/unit/ops/test_ops_schema.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>]. warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

merlin/transforms/init.py 1 1 0% merlin/transforms/ops/init.py 1 1 0%

TOTAL 2 2 0%

=========================== short test summary info ============================ SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto' SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: needs horovod ========== 1432 passed, 2 skipped, 258 warnings in 1123.71s (0:18:43) ========== /usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected) self._warn("No data was collected.", slug="no-data-collected") ___________________________________ summary ____________________________________ test-gpu: commands succeeded congratulations :) Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins8802569408947653966.sh

nvidia-merlin-bot avatar Oct 03 '22 01:10 nvidia-merlin-bot

Click to view CI Results
GitHub pull request #1547 of commit 0f10b88cb6b55014f6e4359e3caf20881fc70b13, no merge conflicts.
Running as SYSTEM
Setting status of 0f10b88cb6b55014f6e4359e3caf20881fc70b13 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4738/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
 > git rev-parse 0f10b88cb6b55014f6e4359e3caf20881fc70b13^{commit} # timeout=10
Checking out Revision 0f10b88cb6b55014f6e4359e3caf20881fc70b13 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0f10b88cb6b55014f6e4359e3caf20881fc70b13 # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7147864933175555979.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+12.g0f10b88cb.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.85,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.84,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@0f10b88cb6b55014f6e4359e3caf20881fc70b13#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.3,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3722574718'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-f1gr5ow8
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-f1gr5ow8
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 98cd36067d5ad9bb952aa2dbfac55eb059bb7bc4
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.5.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (0.55.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (21.3)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (4.64.1)
Requirement already satisfied: betterproto=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (1.3.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (7.0.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (1.10.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (3.19.5)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+1.g98cd360) (0.4.3)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.4.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.12.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.8.0)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.4)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (8.1.3)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.4.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (6.1)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.0.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (3.1.2)
Requirement already satisfied: llvmlite=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (0.38.1)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (1.20.3)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (65.3.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+1.g98cd360) (3.0.9)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (2.8.2)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.52.0)
Requirement already satisfied: absl-py=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.1)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.1.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (4.0.0)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (6.0.1)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.7.0+1.g98cd360-py3-none-any.whl size=114014 sha256=3582ec663f3112230cfba9d03ff29e647ff00639cd69ec88ecd8fc6b3d325d85
  Stored in directory: /tmp/pip-ephem-wheel-cache-rjwglxax/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
  Attempting uninstall: merlin-core
    Found existing installation: merlin-core 0.3.0+12.g78ecddd
    Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
    Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.7.0+1.g98cd360
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1441 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] .... [ 8%] tests/unit/test_notebooks.py .... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/examples/test_01-Getting-started.py . [ 12%] tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%] tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ..................................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 46%] tests/unit/ops/test_groupyby.py ..................... [ 47%] tests/unit/ops/test_hash_bucket.py ......................... [ 49%] tests/unit/ops/test_join.py ............................................ [ 52%] ........................................................................ [ 57%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 63%] .. [ 63%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] .......................................................... [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]

=============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)

.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)

nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader. warnings.warn(

tests/unit/test_dask_nvt.py: 6 warnings tests/unit/workflow/test_workflow.py: 78 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results. warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops_schema.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>]. warnings.warn(

tests/unit/ops/test_ops_schema.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>]. warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

merlin/transforms/init.py 1 1 0% merlin/transforms/ops/init.py 1 1 0%

TOTAL 2 2 0%

=========================== short test summary info ============================ SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto' SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: needs horovod ========== 1440 passed, 2 skipped, 258 warnings in 1150.12s (0:19:10) ========== /usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected) self._warn("No data was collected.", slug="no-data-collected") ___________________________________ summary ____________________________________ test-gpu: commands succeeded congratulations :) Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins17135248576413537766.sh

nvidia-merlin-bot avatar Oct 03 '22 03:10 nvidia-merlin-bot

@rnyak @karlhigley can you review this pull request?

kuwarkapur avatar Oct 03 '22 08:10 kuwarkapur

Click to view CI Results
GitHub pull request #1547 of commit 56fb11a3ab27fe850c3374c1f344f81f614f667f, no merge conflicts.
Running as SYSTEM
Setting status of 56fb11a3ab27fe850c3374c1f344f81f614f667f to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4746/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
 > git rev-parse 56fb11a3ab27fe850c3374c1f344f81f614f667f^{commit} # timeout=10
Checking out Revision 56fb11a3ab27fe850c3374c1f344f81f614f667f (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 56fb11a3ab27fe850c3374c1f344f81f614f667f # timeout=10
Commit message: "Merge branch 'main' into main"
 > git rev-list --no-walk 3fb7db360ca92f7800f31f666b4d3ab56118fde9 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins611663645699638183.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+14.g56fb11a3a.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.90,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.89,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@56fb11a3ab27fe850c3374c1f344f81f614f667f#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.3,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='728129109'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
  Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-s91s9ym0
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-s91s9ym0
  Resolved https://github.com/NVIDIA-Merlin/core.git to commit 14a18dc0de5d5fd7737ecbadf9f6d7fa5d801b67
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (7.0.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (0.55.1)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (1.3.5)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (4.64.1)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.3.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (21.3)
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.3.0)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.5.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (3.19.5)
Requirement already satisfied: betterproto=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (1.10.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+9.g14a18dc) (0.4.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (5.4.1)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (0.12.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.0.4)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (6.1)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.7.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (8.1.3)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.0.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (3.1.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.4.0)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (65.3.0)
Requirement already satisfied: llvmlite=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (0.38.1)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+9.g14a18dc) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+9.g14a18dc) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+9.g14a18dc) (2022.2.1)
Requirement already satisfied: absl-py=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.52.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+9.g14a18dc) (6.0.2)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+9.g14a18dc) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.1.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+9.g14a18dc) (4.0.0)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+9.g14a18dc) (6.0.1)
Building wheels for collected packages: merlin-core
  Building wheel for merlin-core (pyproject.toml): started
  Building wheel for merlin-core (pyproject.toml): finished with status 'done'
  Created wheel for merlin-core: filename=merlin_core-0.7.0+9.g14a18dc-py3-none-any.whl size=118253 sha256=8c004aad8cf77e2c39dc85818c1b6127b5fed865e68cf3f3e6344b094172c079
  Stored in directory: /tmp/pip-ephem-wheel-cache-mm0otxlo/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
  Attempting uninstall: merlin-core
    Found existing installation: merlin-core 0.3.0+12.g78ecddd
    Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
    Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.7.0+9.g14a18dc
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1441 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%] ........................................................................ [ 8%] .... [ 8%] tests/unit/test_notebooks.py .... [ 8%] tests/unit/test_tf4rec.py . [ 8%] tests/unit/test_tools.py ...................... [ 10%] tests/unit/test_triton_inference.py ................................ [ 12%] tests/unit/examples/test_01-Getting-started.py . [ 12%] tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%] tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%] tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%] tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%] ................................................... [ 18%] tests/unit/framework_utils/test_torch_layers.py . [ 18%] tests/unit/loader/test_dataloader_backend.py ...... [ 18%] tests/unit/loader/test_tf_dataloader.py ................................ [ 20%] ........................................s.. [ 23%] tests/unit/loader/test_torch_dataloader.py ............................. [ 25%] ...................................................... [ 29%] tests/unit/ops/test_categorify.py ...................................... [ 32%] ........................................................................ [ 37%] ..................................................... [ 40%] tests/unit/ops/test_column_similarity.py ........................ [ 42%] tests/unit/ops/test_drop_low_cardinality.py .. [ 42%] tests/unit/ops/test_fill.py ............................................ [ 45%] ........ [ 46%] tests/unit/ops/test_groupyby.py ..................... [ 47%] tests/unit/ops/test_hash_bucket.py ......................... [ 49%] tests/unit/ops/test_join.py ............................................ [ 52%] ........................................................................ [ 57%] .................................. [ 59%] tests/unit/ops/test_lambda.py .......... [ 60%] tests/unit/ops/test_normalize.py ....................................... [ 63%] .. [ 63%] tests/unit/ops/test_ops.py ............................................. [ 66%] .................... [ 67%] tests/unit/ops/test_ops_schema.py ...................................... [ 70%] ........................................................................ [ 75%] ........................................................................ [ 80%] ........................................................................ [ 85%] ....................................... [ 88%] tests/unit/ops/test_reduce_dtype_size.py .. [ 88%] tests/unit/ops/test_target_encode.py ..................... [ 89%] tests/unit/workflow/test_cpu_workflow.py ...... [ 90%] tests/unit/workflow/test_workflow.py ................................... [ 92%] .......................................................... [ 96%] tests/unit/workflow/test_workflow_chaining.py ... [ 96%] tests/unit/workflow/test_workflow_node.py ........... [ 97%] tests/unit/workflow/test_workflow_ops.py ... [ 97%] tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%] ... [100%]

=============================== warnings summary =============================== ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33 /usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. DASK_VERSION = LooseVersion(dask.version)

.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)

nvtabular/loader/init.py:19 /var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader. warnings.warn(

tests/unit/test_dask_nvt.py: 6 warnings tests/unit/workflow/test_workflow.py: 78 warnings /var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results. warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files. warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters. warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings tests/unit/loader/test_torch_dataloader.py: 12 warnings tests/unit/workflow/test_workflow.py: 9 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files. warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet] tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet] tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True] /var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_ops_schema.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>]. warnings.warn(

tests/unit/ops/test_ops_schema.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>]. warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings tests/unit/workflow/test_workflow.py: 12 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files. warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files. warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_parquet_output[True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION] tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None] /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files. warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

merlin/transforms/init.py 1 1 0% merlin/transforms/ops/init.py 1 1 0%

TOTAL 2 2 0%

=========================== short test summary info ============================ SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto' SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: needs horovod ========== 1440 passed, 2 skipped, 258 warnings in 1185.85s (0:19:45) ========== /usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected) self._warn("No data was collected.", slug="no-data-collected") /usr/local/lib/python3.8/dist-packages/coverage/data.py:130: CoverageWarning: Data file '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.coverage.10.20.17.231.6542.537287' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object data._warn(str(exc)) /usr/local/lib/python3.8/dist-packages/coverage/data.py:130: CoverageWarning: Data file '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.coverage.10.20.17.231.6540.646720' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object data._warn(str(exc)) ___________________________________ summary ____________________________________ test-gpu: commands succeeded congratulations :) Performing Post build task... Match found for : : True Logical operation result is TRUE Running script : #!/bin/bash cd /var/jenkins_home/ CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" [nvtabular_tests] $ /bin/bash /tmp/jenkins15406989966855063072.sh

nvidia-merlin-bot avatar Oct 09 '22 11:10 nvidia-merlin-bot

Closing since this notebook has moved to another repo since the PR was opened

karlhigley avatar Jan 31 '23 03:01 karlhigley