NVTabular
NVTabular copied to clipboard
Graphs were correctly visualized when ran the script again.
graph of categorical features and the combination of categorical features i.e.('userId', 'movieId') and numerical feature i.e. (rating) were visualized and the difference can be seen in the uploaded script.
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Click to view CI Results
GitHub pull request #1547 of commit cb228501b8f1f079e943cc2bade29eb94acdee1c, no merge conflicts.
Running as SYSTEM
Setting status of cb228501b8f1f079e943cc2bade29eb94acdee1c to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/4469/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
> git rev-parse cb228501b8f1f079e943cc2bade29eb94acdee1c^{commit} # timeout=10
Checking out Revision cb228501b8f1f079e943cc2bade29eb94acdee1c (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f cb228501b8f1f079e943cc2bade29eb94acdee1c # timeout=10
Commit message: "Add files via upload"
> git rev-list --no-walk 8b43ecde40769ce5105a733c799b9a055b994093 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6299493311054922507.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.0.0)
Collecting setuptools
Downloading setuptools-62.2.0-py3-none-any.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 12.4 MB/s eta 0:00:00
Installing collected packages: setuptools
Attempting uninstall: setuptools
Found existing installation: setuptools 61.0.0
Uninstalling setuptools-61.0.0:
Successfully uninstalled setuptools-61.0.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-auth 1.35.0 requires cachetools=2.0.0, but you have cachetools 5.0.0 which is incompatible.
tensorflow-gpu 2.8.0 requires keras=2.8.0rc0, but you have keras 2.6.0 which is incompatible.
tensorflow-gpu 2.8.0 requires tensorboard=2.8, but you have tensorboard 2.6.0 which is incompatible.
Successfully installed setuptools-62.2.0
WARNING: You are using pip version 22.0.4; however, version 22.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (22.0.4)
Collecting pip
Downloading pip-22.1-py3-none-any.whl (2.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 13.7 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.2.0)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.1)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.9.2)
Requirement already satisfied: numpy==1.20.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (1.20.3)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.0.4
Uninstalling pip-22.0.4:
Successfully uninstalled pip-22.0.4
WARNING: The scripts pip, pip3, pip3.10 and pip3.8 are installed in '/var/jenkins_home/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.6.2 requires spacy=2021.11.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: betterproto=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.20.1)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.64.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.0.0)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.3.5)
Requirement already satisfied: dask>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.55.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (21.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.4.2)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.11.2)
Requirement already satisfied: pyyaml in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.4.1)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (62.2.0)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (8.0.4)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.8.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.3)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.4.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.3)
Requirement already satisfied: llvmlite=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.38.0)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.8)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2022.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.8.2)
Requirement already satisfied: absl-py=0.9 in /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.12.0)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.56.0)
Requirement already satisfied: six in /var/jenkins_home/.local/lib/python3.8/site-packages (from absl-py=0.9->tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.15.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.2.1)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.1)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.1)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.0.0)
Building wheels for collected packages: merlin-core
Building wheel for merlin-core (pyproject.toml): started
Building wheel for merlin-core (pyproject.toml): finished with status 'done'
Created wheel for merlin-core: filename=merlin_core-0.3.0+1.g3c62869-py3-none-any.whl size=133336 sha256=4292ff6c26a37f59536cfe056293c3c28fdc2d02a953cd0337eff344b571d916
Stored in directory: /tmp/pip-ephem-wheel-cache-1av9ttz8/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nvtabular 1.0.0+10.g4df99eb4 requires merlin-core==0.2.0, but you have merlin-core 0.3.0+1.g3c62869 which is incompatible.
Successfully installed merlin-core-0.3.0+1.g3c62869
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: natsort==8.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (8.1.0)
Requirement already satisfied: myst-nb=7.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-nb=5.6 in /usr/local/lib/python3.8/dist-packages (from myst-nb=0.15 in /usr/local/lib/python3.8/dist-packages (from myst-nb=3.1 in /usr/local/lib/python3.8/dist-packages (from myst-nb=7.1 in /usr/local/lib/python3.8/dist-packages (from sphinx-external-toc=4.5.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=4.3.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=7.0.0->myst-nb=7.0.0->myst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from ipywidgets=7.0.0->myst-nbmyst-nb=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb4.3 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=2.4.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nbmyst-nb=18.5 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nb=2.0.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nbmyst-nb=0.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nb=1.3.12 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nbmyst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-parser~=0.15.2->myst-nb=5.6->myst-nb=4.7 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=2.0 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=0.8.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=1.4.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=0.2.2 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat~=5.0->myst-nbmyst-nb=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=2.5.0 in /usr/lib/python3/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=1.3 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=0.7 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=1.1 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nbmyst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->myst-nb=2015.7 in /usr/local/lib/python3.8/dist-packages (from babel>=1.3->sphinx=3.1->myst-nb=6.1.12 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=1.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=6.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython->myst-nb=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=1.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython->myst-nb=2.0.0->ipython->myst-nb=1.3.12->jupyter-cache~=0.4.1->myst-nb=4.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from widgetsnbextension~=3.6.0->ipywidgets=7.0.0->myst-nb1.2 in /var/jenkins_home/.local/lib/python3.8/site-packages/soupsieve-2.2.1-py3.8.egg (from beautifulsoup4->nbconvert=5.6->myst-nbnbconvert=5.6->myst-nb=1.9.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from bleach->nbconvert=5.6->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nb=0.2.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nb=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->nbconvert=5.6->myst-nbipython->myst-nbipython->myst-nbipython->myst-nb=4.0.1 in /usr/local/lib/python3.8/dist-packages (from GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nb=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=22.3 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=3.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=0.8.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=2.8 in /usr/lib/python3/dist-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=3.0.1 in /usr/local/lib/python3.8/dist-packages (from gitdb=4.0.1->GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1 in /usr/local/lib/python3.8/dist-packages (from argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1->argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb build/lib.linux-x86_64-cpython-38/tests
copying tests/__init__.py -> build/lib.linux-x86_64-cpython-38/tests
creating build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/io.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/_version.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/graph.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/dispatch.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/worker.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular
creating build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_triton_inference.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_dask_nvt.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tf4rec.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_s3.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tools.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/__init__.py -> build/lib.linux-x86_64-cpython-38/tests/unit
creating build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/torch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/backend.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tf_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference
copying nvtabular/inference/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
copying nvtabular/framework_utils/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
creating build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/inspector_script.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/dataset_inspector.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/data_gen.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
creating build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/data_stats.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/stat_operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/clip.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/target_encoding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/add_metadata.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/logop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hashed_cross.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/categorify.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/rename.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/list_slice.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hash_bucket.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/fill.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/dropna.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/lambdaop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/value_counts.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/normalize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/filter.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_external.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/moments.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/difference_lag.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/bucketize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/column_similarity.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
creating build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/node.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/workflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/benchmarking_tools.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/model_config_pb2.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/workflow_model.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/ensemble.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/data_conversions.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/base.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/hugectr.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/pytorch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/model_pt.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/feature_column_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/models.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/outer_product.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/embedding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/interaction.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/embeddings.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
/usr/local/lib/python3.8/dist-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
package init file 'ci/__init__.py' not found (or not a regular file)
package init file 'images/__init__.py' not found (or not a regular file)
package init file 'docs/__init__.py' not found (or not a regular file)
package init file 'cpp/__init__.py' not found (or not a regular file)
package init file 'bench/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench
copying bench/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/bench
package init file 'merlin/__init__.py' not found (or not a regular file)
package init file 'examples/__init__.py' not found (or not a regular file)
package init file 'conda/__init__.py' not found (or not a regular file)
package init file 'docs/source/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/docs
creating build/lib.linux-x86_64-cpython-38/docs/source
copying docs/source/conf.py -> build/lib.linux-x86_64-cpython-38/docs/source
package init file 'docs/source/_templates/__init__.py' not found (or not a regular file)
package init file 'docs/source/images/__init__.py' not found (or not a regular file)
package init file 'docs/source/training/__init__.py' not found (or not a regular file)
package init file 'docs/source/resources/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/inference/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets
copying bench/datasets/test_dataset.py -> build/lib.linux-x86_64-cpython-38/bench/datasets
package init file 'bench/torch/__init__.py' not found (or not a regular file)
package init file 'bench/examples/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dask-nvtabular-criteo-benchmark.py -> build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dataloader_bench.py -> build/lib.linux-x86_64-cpython-38/bench/examples
package init file 'bench/datasets/configs/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/tools/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_hugectr.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_pytorch.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/nvt_etl.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_tensorflow.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
package init file 'bench/torch/criteo/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/merlin
creating build/lib.linux-x86_64-cpython-38/merlin/transforms
copying merlin/transforms/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms
creating build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
copying merlin/transforms/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
package init file 'examples/tensorflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples
creating build/lib.linux-x86_64-cpython-38/examples/tensorflow
copying examples/tensorflow/callbacks.py -> build/lib.linux-x86_64-cpython-38/examples/tensorflow
package init file 'examples/getting-started-movielens/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-toy-example/__init__.py' not found (or not a regular file)
package init file 'examples/tabular-data-rossmann/__init__.py' not found (or not a regular file)
package init file 'examples/advanced-ops-outbrain/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-movielens/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/tf_trainer.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/torch_trainer_dist.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
package init file 'examples/scaling-criteo/__init__.py' not found (or not a regular file)
package init file 'examples/winning-solution-recsys2020-twitter/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/docker/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/getting-started-movielens/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/scaling-criteo/imgs/__init__.py' not found (or not a regular file)
package init file 'tests/integration/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_tf_inference.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_inf_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_hugectr.py -> build/lib.linux-x86_64-cpython-38/tests/integration
package init file 'tests/integration/common/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common
copying tests/integration/common/utils.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common
package init file 'tests/integration/common/parsers/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/benchmark_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/rossmann_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/criteo_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
package init file 'tests/unit/loader/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_dataloader_backend.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_tf_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_torch_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
package init file 'tests/unit/framework_utils/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_feature_columns.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_torch_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
package init file 'tests/unit/ops/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_fill.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_lambda.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_categorify.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops_schema.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_target_encode.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_groupyby.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_column_similarity.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_join.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_normalize.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_hash_bucket.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
package init file 'tests/unit/workflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_cpu_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_schemas.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_node.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_chaining.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
package init file 'conda/environments/__init__.py' not found (or not a regular file)
package init file 'conda/recipes/__init__.py' not found (or not a regular file)
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build/temp.linux-x86_64-cpython-38
creating build/temp.linux-x86_64-cpython-38/cpp
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+4.gcb228501 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+4.gcb228501 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+4.gcb228501 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+4.gcb228501 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -L/usr/lib -o build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so ->
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 1.1.1+4.gcb228501 is already the active version in easy-install.pth
Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Running black --check
All done! ✨ 🍰 ✨
131 files would be left unchanged.
Running flake8
Running isort
Skipped 2 files
Running bandit
Running pylint
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
INFO:sphinxcontrib.copydirs.copydirs:Copying source documentation from: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples
INFO:sphinxcontrib.copydirs.copydirs: ...to destination: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/source/examples
INFO:traitlets:Writing 14816 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain
INFO:traitlets:Writing 35171 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 19347 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/03-Training-with-TF.ipynb
INFO:traitlets:Writing 14170 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Writing 34457 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 28932 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb
INFO:traitlets:Writing 20504 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-PyTorch.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Writing 61676 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-TF.ipynb
INFO:traitlets:Writing 18521 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb
INFO:traitlets:Writing 21842 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb
INFO:traitlets:Writing 43655 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb
INFO:traitlets:Writing 44549 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb
INFO:traitlets:Writing 9604 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/01-Download-Convert.ipynb
INFO:traitlets:Writing 21552 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 12041 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-FastAI.ipynb
INFO:traitlets:Writing 20792 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo
INFO:traitlets:Writing 203961 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-TF.ipynb
INFO:traitlets:Writing 32956 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb
INFO:traitlets:Writing 25153 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb
INFO:traitlets:Writing 23938 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Writing 33764 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 19635 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-FastAI.ipynb
INFO:traitlets:Writing 17586 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-PyTorch.ipynb
INFO:traitlets:Writing 21354 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-TF.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter
INFO:traitlets:Writing 77074 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter/01-02-04-Download-Convert-ETL-with-NVTabular-Training-with-XGBoost.ipynb
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1420 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ............... [ 46%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92
../../../../../usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92
/usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if pa_version and LooseVersion(pa_version) < LooseVersion("2.0"):
../../../../../usr/lib/python3.8/site-packages/dask_cudf/core.py:32
/usr/lib/python3.8/site-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../../../usr/local/lib/python3.8/dist-packages/setuptools/_distutils/version.py:351: 34 warnings
/usr/local/lib/python3.8/dist-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/lib/python3.8/site-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/core/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/core/merlin/core/utils.py:433: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/lib/python3.8/site-packages/cudf/core/series.py:923: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/core/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_categorify.py::test_categorify_max_size[6]
tests/unit/ops/test_categorify.py::test_categorify_max_size[max_emb_size1]
tests/unit/ops/test_categorify.py::test_categorify_max_size_null_iloc_check
/usr/lib/python3.8/site-packages/cudf/core/frame.py:3077: FutureWarning: keep_index is deprecated and will be removed in the future.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/lib/python3.8/site-packages/cudf/core/dataframe.py:3041: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/core/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/core/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/core/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing
nvtabular/init.py 22 0 0 0 100%
nvtabular/dispatch.py 3 3 0 0 0% 18-23
nvtabular/framework_utils/init.py 2 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 78 90 15 39% 30, 99, 103, 114-130, 140, 143-158, 162, 166-167, 173-198, 207-217, 220-227, 229->233, 234, 239-279, 282
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 12 89 6 91% 60, 68->49, 122, 179, 231-239, 335->343, 357->360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 25 22 1 45% 49, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 12 0 19% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 32 2 18 2 92% 50, 91
nvtabular/framework_utils/torch/models.py 45 1 30 4 93% 57->61, 87->89, 93->96, 103
nvtabular/framework_utils/torch/utils.py 75 5 34 5 91% 51->53, 64, 71->76, 75, 118-120
nvtabular/graph.py 3 3 0 0 0% 18-23
nvtabular/inference/init.py 2 0 0 0 100%
nvtabular/inference/triton/init.py 36 12 14 1 58% 42-49, 68, 72, 76-82
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 3 58 4 95% 32-33, 84
nvtabular/inference/triton/ensemble.py 266 148 82 7 46% 90-94, 157-196, 240-288, 305-309, 381-389, 418-434, 486-496, 548-588, 594-610, 614-681, 711, 733, 739-758, 764-788, 795
nvtabular/inference/triton/model/init.py 0 0 0 0 100%
nvtabular/inference/triton/model/model_pt.py 101 101 42 0 0% 27-220
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/inference/triton/workflow_model.py 55 55 24 0 0% 27-128
nvtabular/inference/workflow/init.py 0 0 0 0 100%
nvtabular/inference/workflow/base.py 113 113 62 0 0% 27-209
nvtabular/inference/workflow/hugectr.py 37 37 16 0 0% 27-87
nvtabular/inference/workflow/pytorch.py 10 10 6 0 0% 27-46
nvtabular/inference/workflow/tensorflow.py 32 32 10 0 0% 26-68
nvtabular/io.py 3 3 0 0 0% 18-23
nvtabular/loader/init.py 2 0 0 0 100%
nvtabular/loader/backend.py 371 17 154 12 94% 27-28, 142, 158-159, 299->301, 311-315, 362-363, 402->406, 403->402, 478, 482-483, 512, 588-589, 624, 632
nvtabular/loader/tensorflow.py 179 25 60 9 86% 38-39, 74, 85-89, 101, 115, 124, 329, 357, 368, 383-385, 414-416, 426-434, 437-440
nvtabular/loader/tf_utils.py 57 10 22 6 80% 31->34, 34->36, 41->43, 45, 46->67, 52-53, 61-63, 69-73
nvtabular/loader/torch.py 87 14 26 3 80% 28-30, 33-39, 114, 158-159, 164
nvtabular/ops/init.py 26 0 0 0 100%
nvtabular/ops/add_metadata.py 34 0 14 0 100%
nvtabular/ops/bucketize.py 40 9 20 3 73% 52-54, 58->exit, 61-64, 83-86
nvtabular/ops/categorify.py 660 70 348 48 86% 251, 253, 271, 275, 283, 291, 293, 320, 341-342, 391->395, 399-406, 452, 460, 483-484, 561-566, 637, 733, 750, 795, 873-874, 889-893, 894->858, 912, 920, 927->exit, 951, 954->957, 1006->1004, 1068, 1073, 1107->1111, 1113->1053, 1119-1122, 1134, 1138, 1142, 1149, 1154-1157, 1235, 1237, 1307->1330, 1313->1330, 1331-1336, 1381, 1394->1397, 1401->1406, 1405, 1411, 1414, 1422-1432
nvtabular/ops/clip.py 18 2 8 3 81% 44, 52->54, 55
nvtabular/ops/column_similarity.py 121 26 38 5 74% 19-20, 29-30, 81->exit, 111, 206-207, 216-218, 226-242, 259->262, 263, 273
nvtabular/ops/data_stats.py 56 1 24 3 95% 107->109, 111, 113->103
nvtabular/ops/difference_lag.py 43 0 14 1 98% 73->75
nvtabular/ops/drop_low_cardinality.py 18 0 10 1 96% 85->84
nvtabular/ops/dropna.py 9 0 2 0 100%
nvtabular/ops/fill.py 76 5 30 1 92% 63-67, 109
nvtabular/ops/filter.py 20 1 8 1 93% 49
nvtabular/ops/groupby.py 135 4 88 5 96% 74, 86, 96->98, 141, 233
nvtabular/ops/hash_bucket.py 43 1 22 2 95% 79, 118->124
nvtabular/ops/hashed_cross.py 38 3 17 3 89% 52, 64, 92
nvtabular/ops/join_external.py 96 8 34 7 88% 20-21, 114, 116, 118, 150->152, 205-206, 216->227, 221
nvtabular/ops/join_groupby.py 128 5 57 6 94% 113, 120, 129, 136->135, 178->175, 181->175, 260-261
nvtabular/ops/lambdaop.py 62 6 22 6 86% 60, 64, 82, 95, 100, 109
nvtabular/ops/list_slice.py 89 29 42 0 64% 21-22, 146-160, 168-190
nvtabular/ops/logop.py 21 0 6 0 100%
nvtabular/ops/moments.py 69 0 24 0 100%
nvtabular/ops/normalize.py 93 4 22 1 94% 89, 139-140, 167
nvtabular/ops/operator.py 11 1 2 0 92% 52
nvtabular/ops/reduce_dtype_size.py 49 0 20 2 97% 68->77, 74->77
nvtabular/ops/rename.py 29 3 14 3 86% 46, 71-73
nvtabular/ops/stat_operator.py 8 0 2 0 100%
nvtabular/ops/target_encoding.py 182 9 76 5 93% 168->172, 176->185, 274, 283-284, 297-303, 396->399
nvtabular/ops/value_counts.py 34 0 6 1 98% 40->38
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 271 25 90 7 91% 26-27, 31-32, 129-132, 142-146, 148, 170-171, 322, 332, 358, 361-370
nvtabular/tools/dataset_inspector.py 52 8 24 2 79% 33-40, 51
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 3 0 0 0 100%
nvtabular/worker.py 3 3 0 0 0% 18-23
nvtabular/workflow/init.py 2 0 0 0 100%
nvtabular/workflow/node.py 7 0 4 0 100%
nvtabular/workflow/workflow.py 219 17 94 12 91% 28-29, 52, 85, 206, 212->226, 239-241, 373, 388-389, 431, 508, 538, 548-550, 563
TOTAL 5211 1129 2095 203 76%
Coverage XML written to file coverage.xml
Required test coverage of 70% reached. Total coverage: 76.50%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/lib/python3.8/site-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': cannot import name 'ParamSpec' from 'typing_extensions' (/var/jenkins_home/.local/lib/python3.8/site-packages/typing_extensions.py)
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: not working correctly in ci environment
========== 1419 passed, 2 skipped, 665 warnings in 1168.52s (0:19:28) ==========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins5905419844544771701.sh
@benfred can you review this pull request ?
Click to view CI Results
GitHub pull request #1547 of commit 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1, no merge conflicts.
Running as SYSTEM
Setting status of 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/4473/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
> git rev-parse 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1^{commit} # timeout=10
Checking out Revision 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1 (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 4a31dd03acf314fce3c2e7e6ee2f4ed4fe0235c1 # timeout=10
Commit message: "Merge branch 'main' into main"
> git rev-list --no-walk 4d416c743185014d21ac3f337084bb49482e5daf # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins1922903398759176915.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.0.0)
Collecting setuptools
Downloading setuptools-62.3.1-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 39.2 MB/s eta 0:00:00
Installing collected packages: setuptools
Attempting uninstall: setuptools
Found existing installation: setuptools 61.0.0
Uninstalling setuptools-61.0.0:
Successfully uninstalled setuptools-61.0.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-auth 1.35.0 requires cachetools=2.0.0, but you have cachetools 5.0.0 which is incompatible.
tensorflow-gpu 2.8.0 requires keras=2.8.0rc0, but you have keras 2.6.0 which is incompatible.
tensorflow-gpu 2.8.0 requires tensorboard=2.8, but you have tensorboard 2.6.0 which is incompatible.
Successfully installed setuptools-62.3.1
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (22.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.3.1)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.1)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.9.2)
Requirement already satisfied: numpy==1.20.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (1.20.3)
Found existing installation: nvtabular 1.0.0+10.g4df99eb4
Uninstalling nvtabular-1.0.0+10.g4df99eb4:
Successfully uninstalled nvtabular-1.0.0+10.g4df99eb4
Found existing installation: merlin-core 0+untagged.78.gc43c798
Uninstalling merlin-core-0+untagged.78.gc43c798:
Successfully uninstalled merlin-core-0+untagged.78.gc43c798
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git
Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-install-76qa40o7/merlin-core_c19c73fda7b34f28a5358bbbbb9ea8a3
Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-install-76qa40o7/merlin-core_c19c73fda7b34f28a5358bbbbb9ea8a3
Resolved https://github.com/NVIDIA-Merlin/core.git to commit 98dd76b5646cb9f0cf1ff089f20f5afeaba37217
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.0.0)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.64.0)
Requirement already satisfied: distributed>=2021.11.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: dask>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (21.3)
Requirement already satisfied: betterproto=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.20.1)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.55.1)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.3.5)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.4.2)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.0)
Requirement already satisfied: pyyaml in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.4.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.11.2)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (62.3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.3)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.8.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.3)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (8.0.4)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.4.0)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.1)
Requirement already satisfied: llvmlite=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.38.0)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.8)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2022.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.8.2)
Requirement already satisfied: absl-py=0.9 in /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.12.0)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.56.0)
Requirement already satisfied: six in /var/jenkins_home/.local/lib/python3.8/site-packages (from absl-py=0.9->tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.15.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.2.1)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.2)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.1)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.0.0)
Building wheels for collected packages: merlin-core
Building wheel for merlin-core (pyproject.toml): started
Building wheel for merlin-core (pyproject.toml): finished with status 'done'
Created wheel for merlin-core: filename=merlin_core-0.3.0+2.g98dd76b-py3-none-any.whl size=133339 sha256=c8daca3ddfc9cc58fdfedf350e077b35f5fa6602b9734ef708d9974570e61b33
Stored in directory: /tmp/pip-ephem-wheel-cache-qtnt5mlm/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nvtabular 1.0.0+10.g4df99eb4 requires merlin-core==0.2.0, but you have merlin-core 0.3.0+2.g98dd76b which is incompatible.
Successfully installed merlin-core-0.3.0+2.g98dd76b
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: natsort==8.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (8.1.0)
Requirement already satisfied: myst-nb=5.6 in /usr/local/lib/python3.8/dist-packages (from myst-nb=7.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-nb=3.1 in /usr/local/lib/python3.8/dist-packages (from myst-nb=0.15 in /usr/local/lib/python3.8/dist-packages (from myst-nb=7.1 in /usr/local/lib/python3.8/dist-packages (from sphinx-external-toc=7.0.0->myst-nb=4.5.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=7.0.0->myst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from ipywidgets=7.0.0->myst-nb=4.3.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nbmyst-nb=2.0.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=2.4.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nbmyst-nb=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nb=18.5 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb4.3 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nbmyst-nb=1.3.12 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nb=0.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nbmyst-nbmyst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-parser~=0.15.2->myst-nb=5.6->myst-nb=4.7 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=2.0 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=1.4.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=0.8.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=0.2.2 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nbmyst-nb=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat~=5.0->myst-nb=3.1->myst-nb=3.1->myst-nb=1.1 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=2.5.0 in /usr/lib/python3/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=0.7 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=1.3 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nbmyst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->myst-nb=2015.7 in /usr/local/lib/python3.8/dist-packages (from babel>=1.3->sphinx=3.1->myst-nb=6.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=1.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=6.1.12 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython->myst-nb=1.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython->myst-nb=2.0.0->ipython->myst-nb=1.3.12->jupyter-cache~=0.4.1->myst-nb=4.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from widgetsnbextension~=3.6.0->ipywidgets=7.0.0->myst-nb1.2 in /var/jenkins_home/.local/lib/python3.8/site-packages/soupsieve-2.2.1-py3.8.egg (from beautifulsoup4->nbconvert=5.6->myst-nbnbconvert=5.6->myst-nb=1.9.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from bleach->nbconvert=5.6->myst-nb=0.2.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nb=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->nbconvert=5.6->myst-nbipython->myst-nbipython->myst-nbipython->myst-nb=4.0.1 in /usr/local/lib/python3.8/dist-packages (from GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nb=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=22.3 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=3.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=0.8.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=2.8 in /usr/lib/python3/dist-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=3.0.1 in /usr/local/lib/python3.8/dist-packages (from gitdb=4.0.1->GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1 in /usr/local/lib/python3.8/dist-packages (from argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1->argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb build/lib.linux-x86_64-cpython-38/tests
copying tests/__init__.py -> build/lib.linux-x86_64-cpython-38/tests
creating build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/io.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/_version.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/graph.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/dispatch.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/worker.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular
creating build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_triton_inference.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_dask_nvt.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tf4rec.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_s3.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tools.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/__init__.py -> build/lib.linux-x86_64-cpython-38/tests/unit
creating build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/torch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/backend.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tf_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference
copying nvtabular/inference/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
copying nvtabular/framework_utils/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
creating build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/inspector_script.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/dataset_inspector.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/data_gen.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
creating build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/data_stats.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/stat_operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/clip.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/target_encoding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/add_metadata.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/logop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hashed_cross.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/categorify.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/rename.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/list_slice.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hash_bucket.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/fill.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/dropna.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/lambdaop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/value_counts.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/normalize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/filter.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_external.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/moments.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/difference_lag.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/bucketize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/column_similarity.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
creating build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/node.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/workflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/benchmarking_tools.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/model_config_pb2.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/workflow_model.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/ensemble.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/data_conversions.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/base.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/hugectr.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/pytorch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/model_pt.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/feature_column_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/models.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/outer_product.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/embedding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/interaction.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/embeddings.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
/usr/local/lib/python3.8/dist-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
package init file 'ci/__init__.py' not found (or not a regular file)
package init file 'images/__init__.py' not found (or not a regular file)
package init file 'docs/__init__.py' not found (or not a regular file)
package init file 'cpp/__init__.py' not found (or not a regular file)
package init file 'bench/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench
copying bench/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/bench
package init file 'merlin/__init__.py' not found (or not a regular file)
package init file 'examples/__init__.py' not found (or not a regular file)
package init file 'conda/__init__.py' not found (or not a regular file)
package init file 'docs/source/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/docs
creating build/lib.linux-x86_64-cpython-38/docs/source
copying docs/source/conf.py -> build/lib.linux-x86_64-cpython-38/docs/source
package init file 'docs/source/_templates/__init__.py' not found (or not a regular file)
package init file 'docs/source/images/__init__.py' not found (or not a regular file)
package init file 'docs/source/training/__init__.py' not found (or not a regular file)
package init file 'docs/source/resources/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/inference/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets
copying bench/datasets/test_dataset.py -> build/lib.linux-x86_64-cpython-38/bench/datasets
package init file 'bench/torch/__init__.py' not found (or not a regular file)
package init file 'bench/examples/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dask-nvtabular-criteo-benchmark.py -> build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dataloader_bench.py -> build/lib.linux-x86_64-cpython-38/bench/examples
package init file 'bench/datasets/configs/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/tools/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_hugectr.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_pytorch.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/nvt_etl.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_tensorflow.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
package init file 'bench/torch/criteo/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/merlin
creating build/lib.linux-x86_64-cpython-38/merlin/transforms
copying merlin/transforms/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms
creating build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
copying merlin/transforms/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
package init file 'examples/tensorflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples
creating build/lib.linux-x86_64-cpython-38/examples/tensorflow
copying examples/tensorflow/callbacks.py -> build/lib.linux-x86_64-cpython-38/examples/tensorflow
package init file 'examples/getting-started-movielens/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-toy-example/__init__.py' not found (or not a regular file)
package init file 'examples/tabular-data-rossmann/__init__.py' not found (or not a regular file)
package init file 'examples/advanced-ops-outbrain/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-movielens/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/tf_trainer.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/torch_trainer_dist.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
package init file 'examples/scaling-criteo/__init__.py' not found (or not a regular file)
package init file 'examples/winning-solution-recsys2020-twitter/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/docker/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/getting-started-movielens/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/scaling-criteo/imgs/__init__.py' not found (or not a regular file)
package init file 'tests/integration/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_tf_inference.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_inf_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_hugectr.py -> build/lib.linux-x86_64-cpython-38/tests/integration
package init file 'tests/integration/common/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common
copying tests/integration/common/utils.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common
package init file 'tests/integration/common/parsers/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/benchmark_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/rossmann_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/criteo_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
package init file 'tests/unit/loader/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_dataloader_backend.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_tf_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_torch_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
package init file 'tests/unit/framework_utils/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_feature_columns.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_torch_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
package init file 'tests/unit/ops/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_fill.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_lambda.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_categorify.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops_schema.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_target_encode.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_groupyby.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_column_similarity.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_join.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_normalize.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_hash_bucket.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
package init file 'tests/unit/workflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_cpu_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_schemas.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_node.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_chaining.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
package init file 'conda/environments/__init__.py' not found (or not a regular file)
package init file 'conda/recipes/__init__.py' not found (or not a regular file)
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build/temp.linux-x86_64-cpython-38
creating build/temp.linux-x86_64-cpython-38/cpp
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+7.g4a31dd03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+7.g4a31dd03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+7.g4a31dd03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+7.g4a31dd03 -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -L/usr/lib -o build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so ->
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 1.1.1+7.g4a31dd03 is already the active version in easy-install.pth
Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Running black --check
All done! ✨ 🍰 ✨
131 files would be left unchanged.
Running flake8
Running isort
Skipped 2 files
Running bandit
Running pylint
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
INFO:sphinxcontrib.copydirs.copydirs:Copying source documentation from: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples
INFO:sphinxcontrib.copydirs.copydirs: ...to destination: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/source/examples
INFO:traitlets:Writing 14816 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain
INFO:traitlets:Writing 35171 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 19347 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/03-Training-with-TF.ipynb
INFO:traitlets:Writing 14170 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Writing 34457 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 28932 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb
INFO:traitlets:Writing 20504 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-PyTorch.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Writing 61676 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-TF.ipynb
INFO:traitlets:Writing 18521 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb
INFO:traitlets:Writing 21842 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb
INFO:traitlets:Writing 43655 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb
INFO:traitlets:Writing 44549 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb
INFO:traitlets:Writing 9604 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/01-Download-Convert.ipynb
INFO:traitlets:Writing 21552 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 12041 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-FastAI.ipynb
INFO:traitlets:Writing 20792 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo
INFO:traitlets:Writing 203961 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-TF.ipynb
INFO:traitlets:Writing 32956 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb
INFO:traitlets:Writing 25153 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb
INFO:traitlets:Writing 23938 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Writing 33764 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 19635 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-FastAI.ipynb
INFO:traitlets:Writing 17586 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-PyTorch.ipynb
INFO:traitlets:Writing 21354 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-TF.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter
INFO:traitlets:Writing 77074 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter/01-02-04-Download-Convert-ETL-with-NVTabular-Training-with-XGBoost.ipynb
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1420 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py .................F
=================================== FAILURES ===================================
___________________________ test_full_df[None-1000] ____________________________
num_rows = 1000
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_full_df_None_1000_0')
distro = None
@pytest.mark.parametrize("num_rows", [1000, 100000])
@pytest.mark.parametrize("distro", [None, distros])
def test_full_df(num_rows, tmpdir, distro):
json_sample["num_rows"] = num_rows
cats = list(json_sample["cats"].keys())
cols = datagen._get_cols_from_schema(json_sample, distros=distro)
df_gen = datagen.DatasetGen(datagen.UniformDistro(), gpu_frac=0.00001)
df_files = df_gen.full_df_create(num_rows, cols, entries=True, output=tmpdir)
test_size = 0
full_df = make_df()
for fi in df_files:
df = Dataset(fi).to_ddf().compute()
test_size = test_size + df.shape[0]
full_df = concat([full_df, df])
assert test_size == num_rows
conts_rep = cols["conts"]
cats_rep = cols["cats"]
labels_rep = cols["labels"]
assert df.shape[1] == len(conts_rep) + len(cats_rep) + len(labels_rep)
for idx, cat in enumerate(cats[1:]):
dist = cats_rep[idx + 1].distro or df_gen.dist
if HAS_GPU:
if not is_string_dtype(full_df[cat]._column):
sts, ps = dist.verify(full_df[cat].to_pandas())
assert all(s > 0.9 for s in sts)
else:
if not is_string_dtype(full_df[cat]):
sts, ps = dist.verify(full_df[cat])
assert all(s > 0.9 for s in sts)
# these are not mh series
assert full_df[cat].nunique() == cats_rep[0].cardinality
assert full_df[cat].str.len().min() == cats_rep[0].min_entry_size
E assert 2 == 1
E + where 2 = <bound method Frame.min of 0 5\n1 5\n2 4\n3 5\n4 3\n ..\n995 5\n996 2\n997 5\n998 2\n999 4\nName: cat_5, Length: 1000, dtype: int32>()
E + where <bound method Frame.min of 0 5\n1 5\n2 4\n3 5\n4 3\n ..\n995 5\n996 2\n997 5\n998 2\n999 4\nName: cat_5, Length: 1000, dtype: int32> = 0 5\n1 5\n2 4\n3 5\n4 3\n ..\n995 5\n996 2\n997 5\n998 2\n999 4\nName: cat_5, Length: 1000, dtype: int32.min
E + where 0 5\n1 5\n2 4\n3 5\n4 3\n ..\n995 5\n996 2\n997 5\n998 2\n999 4\nName: cat_5, Length: 1000, dtype: int32 = <bound method StringMethods.len of <cudf.core.column.string.StringMethods object at 0x7f3af5924220>>()
E + where <bound method StringMethods.len of <cudf.core.column.string.StringMethods object at 0x7f3af5924220>> = <cudf.core.column.string.StringMethods object at 0x7f3af5924220>.len
E + where <cudf.core.column.string.StringMethods object at 0x7f3af5924220> = 0 GXbtF\n1 GXbtF\n2 ECTv\n3 jOME3\n4 c9e\n ... \n995 dCO14\n996 bo\n997 GXbtF\n998 bo\n999 c6Ud\nName: cat_5, Length: 1000, dtype: object.str
E + and 1 = <nvtabular.tools.data_gen.CatCol object at 0x7f3af08e95b0>.min_entry_size
tests/unit/test_tools.py:161: AssertionError
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92
../../../../../usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92
/usr/local/lib/python3.8/dist-packages/fsspec/spec.py:92: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if pa_version and LooseVersion(pa_version) < LooseVersion("2.0"):
../../../../../usr/lib/python3.8/site-packages/dask_cudf/core.py:32
/usr/lib/python3.8/site-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../../../usr/local/lib/python3.8/dist-packages/setuptools/_distutils/version.py:351: 34 warnings
/usr/local/lib/python3.8/dist-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py::test_cats_and_groupby_stats[False-0.01]
tests/unit/test_dask_nvt.py::test_cats_and_groupby_stats[False-0.01]
tests/unit/test_tf4rec.py::test_tf4rec
tests/unit/test_tools.py::test_full_df[None-1000]
/usr/lib/python3.8/site-packages/cudf/core/dataframe.py:1253: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/core/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/core/merlin/core/utils.py:433: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Branch BrPart Cover Missing
nvtabular/init.py 22 0 0 0 100%
nvtabular/dispatch.py 3 3 0 0 0% 18-23
nvtabular/framework_utils/init.py 2 0 0 0 100%
nvtabular/framework_utils/tensorflow/init.py 1 0 0 0 100%
nvtabular/framework_utils/tensorflow/feature_column_utils.py 134 125 90 0 4% 28-32, 69-286
nvtabular/framework_utils/tensorflow/layers/init.py 4 0 0 0 100%
nvtabular/framework_utils/tensorflow/layers/embedding.py 153 86 89 10 40% 31-32, 39, 51-60, 68-69, 73-75, 79-93, 119-124, 177, 179, 193, 205, 217-218, 227, 231-239, 249-265, 307-311, 314-344, 347-360, 363-364, 367
nvtabular/framework_utils/tensorflow/layers/interaction.py 47 39 22 0 14% 48-52, 55-71, 74-103, 106-110, 113
nvtabular/framework_utils/tensorflow/layers/outer_product.py 30 24 12 0 19% 37-38, 41-60, 71-84, 87
nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py 58 58 30 0 0% 16-111
nvtabular/framework_utils/torch/init.py 0 0 0 0 100%
nvtabular/framework_utils/torch/layers/init.py 2 0 0 0 100%
nvtabular/framework_utils/torch/layers/embeddings.py 32 3 18 3 88% 39, 50, 91
nvtabular/framework_utils/torch/models.py 45 1 30 4 93% 57->61, 87->89, 93->96, 103
nvtabular/framework_utils/torch/utils.py 75 9 34 7 85% 51->53, 64, 71->76, 75, 109, 118-120, 129-131
nvtabular/graph.py 3 3 0 0 0% 18-23
nvtabular/inference/init.py 2 0 0 0 100%
nvtabular/inference/triton/init.py 36 12 14 1 58% 42-49, 68, 72, 76-82
nvtabular/inference/triton/benchmarking_tools.py 52 52 10 0 0% 2-103
nvtabular/inference/triton/data_conversions.py 87 73 58 0 10% 32-33, 52-84, 88-94, 98-105, 109-115, 119-136, 140-150
nvtabular/inference/triton/ensemble.py 266 155 82 10 42% 90-94, 157-196, 240-288, 305-309, 381-389, 415, 418-434, 438-442, 455-456, 486-496, 548-588, 594-610, 614-681, 711, 733, 739-758, 764-788, 795
nvtabular/inference/triton/model/init.py 0 0 0 0 100%
nvtabular/inference/triton/model/model_pt.py 101 101 42 0 0% 27-220
nvtabular/inference/triton/model_config_pb2.py 299 0 2 0 100%
nvtabular/inference/triton/workflow_model.py 55 55 24 0 0% 27-128
nvtabular/inference/workflow/init.py 0 0 0 0 100%
nvtabular/inference/workflow/base.py 113 113 62 0 0% 27-209
nvtabular/inference/workflow/hugectr.py 37 37 16 0 0% 27-87
nvtabular/inference/workflow/pytorch.py 10 10 6 0 0% 27-46
nvtabular/inference/workflow/tensorflow.py 32 32 10 0 0% 26-68
nvtabular/io.py 3 3 0 0 0% 18-23
nvtabular/loader/init.py 2 0 0 0 100%
nvtabular/loader/backend.py 371 52 154 27 83% 27-28, 92, 97-98, 125, 137-142, 145->exit, 157-159, 178, 179->181, 235, 271-275, 278-281, 286, 293, 299->301, 311-315, 325-326, 357, 362-363, 399-400, 402->406, 403->402, 431, 449, 478, 482-483, 508-517, 578->581, 588-589, 618, 623-627, 632
nvtabular/loader/tensorflow.py 179 55 60 15 67% 38-39, 63-65, 74, 85-89, 101, 110-115, 124, 311-313, 329, 351, 355-358, 363, 368, 375, 383-385, 389, 395->399, 408-418, 422, 426-434, 437-440, 443-447, 453
nvtabular/loader/tf_utils.py 57 10 22 6 80% 31->34, 34->36, 41->43, 45, 46->67, 52-53, 61-63, 69-73
nvtabular/loader/torch.py 87 39 26 3 50% 28-30, 33-39, 114, 119, 124-130, 154-166, 169, 174-179, 182-187, 190-191
nvtabular/ops/init.py 26 0 0 0 100%
nvtabular/ops/add_metadata.py 34 3 14 0 94% 34, 38, 42
nvtabular/ops/bucketize.py 40 20 20 2 40% 52-54, 58->exit, 59-64, 71-88, 92, 96
nvtabular/ops/categorify.py 660 167 348 77 70% 251, 253, 271, 275, 279, 283, 287, 291, 293, 297, 305, 320, 323-328, 341-342, 372-376, 384-408, 434, 448->451, 452, 457, 460, 483-484, 491-499, 561-566, 598, 625->627, 628, 629->631, 635, 637, 646, 725, 727->730, 733, 750, 759-764, 795, 829, 873-874, 889-893, 894->858, 912, 920, 927-928, 945-946, 951, 954->957, 983, 1003-1021, 1037, 1059, 1063, 1065-1068, 1073, 1077-1089, 1091->1094, 1099->1053, 1107-1114, 1115->1117, 1119-1122, 1134, 1138, 1142, 1149, 1154-1157, 1235, 1237, 1299, 1307->1330, 1313->1330, 1331-1336, 1354, 1358-1366, 1369, 1380-1388, 1394->1397, 1401->1406, 1405, 1411, 1414, 1419-1433, 1454-1462
nvtabular/ops/clip.py 18 2 8 3 81% 44, 52->54, 55
nvtabular/ops/column_similarity.py 121 86 38 0 23% 19-20, 29-30, 72-78, 81-88, 92-114, 125-126, 129-134, 138, 142, 168-197, 206-207, 216-218, 226-242, 251-276, 280-283, 287-288
nvtabular/ops/data_stats.py 56 40 24 0 22% 44-48, 51, 55-93, 96-115, 118
nvtabular/ops/difference_lag.py 43 21 14 1 44% 60->63, 70-81, 87, 90-95, 99, 103, 106
nvtabular/ops/drop_low_cardinality.py 18 11 10 0 32% 30-31, 50, 80-90
nvtabular/ops/dropna.py 9 3 2 0 73% 36-38
nvtabular/ops/fill.py 76 35 30 3 47% 53-55, 63-67, 73, 79, 102-104, 108-115, 120-121, 125-128, 135, 138-142, 145-148
nvtabular/ops/filter.py 20 3 8 3 79% 49, 56, 60
nvtabular/ops/groupby.py 135 15 88 12 83% 74, 86, 96->98, 98->87, 107->112, 132, 141, 150->149, 233, 251, 268-271, 284, 290-297
nvtabular/ops/hash_bucket.py 43 22 22 2 38% 75, 79, 88-101, 106-110, 113-124, 128, 132
nvtabular/ops/hashed_cross.py 38 22 17 1 35% 52, 58-68, 73-78, 82, 87-92
nvtabular/ops/join_external.py 96 19 34 11 72% 20-21, 114, 116, 118, 131, 138, 142-145, 150-151, 156-157, 205-206, 220-227
nvtabular/ops/join_groupby.py 128 20 57 9 79% 111, 113, 120, 126-129, 136-139, 144-146, 177-180, 181->175, 230-231, 260-261
nvtabular/ops/lambdaop.py 62 6 22 6 86% 60, 64, 82, 95, 100, 109
nvtabular/ops/list_slice.py 89 39 42 5 47% 21-22, 67-68, 74, 86-94, 105, 121->127, 146-160, 168-190
nvtabular/ops/logop.py 21 2 6 1 89% 48-51
nvtabular/ops/moments.py 69 1 24 1 98% 71
nvtabular/ops/normalize.py 93 27 22 3 67% 72, 77, 82, 89, 126-128, 134-142, 148, 155-159, 162-163, 167, 176, 180
nvtabular/ops/operator.py 11 1 2 0 92% 52
nvtabular/ops/reduce_dtype_size.py 49 28 20 0 33% 36-39, 43, 46, 49-50, 54-56, 59-80
nvtabular/ops/rename.py 29 3 14 3 86% 46, 71-73
nvtabular/ops/stat_operator.py 8 0 2 0 100%
nvtabular/ops/target_encoding.py 182 127 76 0 22% 166-207, 210-213, 217, 226-227, 230-243, 246-251, 254-257, 261, 265, 269-274, 277-280, 283-284, 287-288, 291-292, 296-377, 381-413, 423-432
nvtabular/ops/value_counts.py 34 20 6 0 40% 37-53, 56, 59, 62-64, 67
nvtabular/tools/init.py 0 0 0 0 100%
nvtabular/tools/data_gen.py 271 25 90 8 90% 26-27, 31-32, 129-132, 142-146, 148, 170-171, 322, 332, 356->355, 358, 361-370
nvtabular/tools/dataset_inspector.py 52 40 24 0 21% 31-40, 50-51, 71-112
nvtabular/tools/inspector_script.py 46 46 0 0 0% 17-168
nvtabular/utils.py 3 0 0 0 100%
nvtabular/worker.py 3 3 0 0 0% 18-23
nvtabular/workflow/init.py 2 0 0 0 100%
nvtabular/workflow/node.py 7 0 4 0 100%
nvtabular/workflow/workflow.py 219 24 94 14 88% 28-29, 52, 85, 121-122, 126, 187->190, 206, 212->226, 239-241, 257, 373, 388-389, 431, 508, 538-546, 548-550, 563
TOTAL 5211 2031 2095 251 57%
Coverage XML written to file coverage.xml
FAIL Required test coverage of 70% not reached. Total coverage: 56.79%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/lib/python3.8/site-packages/dask_cudf/io/tests/test_s3.py:16: could not import 's3fs': cannot import name 'ParamSpec' from 'typing_extensions' (/var/jenkins_home/.local/lib/python3.8/site-packages/typing_extensions.py)
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
====== 1 failed, 140 passed, 1 skipped, 55 warnings in 311.42s (0:05:11) =======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1418994287624193960.sh
Click to view CI Results
GitHub pull request #1547 of commit faf9a2aba510487f7d46069165371cdaacaebf91, no merge conflicts.
Running as SYSTEM
Setting status of faf9a2aba510487f7d46069165371cdaacaebf91 to PENDING with url http://10.20.13.93:8080/job/nvtabular_tests/4485/ and message: 'Pending'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
> git rev-parse faf9a2aba510487f7d46069165371cdaacaebf91^{commit} # timeout=10
Checking out Revision faf9a2aba510487f7d46069165371cdaacaebf91 (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f faf9a2aba510487f7d46069165371cdaacaebf91 # timeout=10
Commit message: "Merge branch 'main' into main"
> git rev-list --no-walk db9adbb37ec2389de0be270b879300f5315655dc # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins423197427368600513.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (61.0.0)
Collecting setuptools
Downloading setuptools-62.3.2-py3-none-any.whl (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 21.3 MB/s eta 0:00:00
Installing collected packages: setuptools
Attempting uninstall: setuptools
Found existing installation: setuptools 61.0.0
Uninstalling setuptools-61.0.0:
Successfully uninstalled setuptools-61.0.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-auth 1.35.0 requires cachetools=2.0.0, but you have cachetools 5.0.0 which is incompatible.
tensorflow-gpu 2.8.0 requires keras=2.8.0rc0, but you have keras 2.6.0 which is incompatible.
tensorflow-gpu 2.8.0 requires tensorboard=2.8, but you have tensorboard 2.6.0 which is incompatible.
Successfully installed setuptools-62.3.2
WARNING: There was an error checking the latest version of pip.
Installing NVTabular
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in /var/jenkins_home/.local/lib/python3.8/site-packages (22.1)
Collecting pip
Downloading pip-22.1.1-py3-none-any.whl (2.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 55.2 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (62.3.2)
Requirement already satisfied: wheel in /var/jenkins_home/.local/lib/python3.8/site-packages (0.37.1)
Requirement already satisfied: pybind11 in /var/jenkins_home/.local/lib/python3.8/site-packages (2.9.2)
Requirement already satisfied: numpy==1.20.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (1.20.3)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 22.1
Uninstalling pip-22.1:
Successfully uninstalled pip-22.1
WARNING: The scripts pip, pip3, pip3.10 and pip3.8 are installed in '/var/jenkins_home/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.6.2 requires spacy=2021.11.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.3.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.0.0)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.64.0)
Requirement already satisfied: dask>=2021.11.2 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.2)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: betterproto=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.20.1)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.55.1)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.4.2)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.11.2)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.2.0)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2021.11.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: pyyaml in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.4.1)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (8.0.4)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.4.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.3)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.7.0)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (62.3.2)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.1)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.3)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (5.8.0)
Requirement already satisfied: llvmlite=0.38.0rc1 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.38.0)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (3.0.8)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2022.1)
Requirement already satisfied: absl-py=0.9 in /var/jenkins_home/.local/lib/python3.8/site-packages/absl_py-0.12.0-py3.8.egg (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.12.0)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.56.0)
Requirement already satisfied: six in /var/jenkins_home/.local/lib/python3.8/site-packages (from absl-py=0.9->tensorflow-metadata>=1.2.0->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.15.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (0.2.1)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (1.0.1)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2021.11.2->merlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (2.0.1)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (6.0.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core@ git+https://github.com/NVIDIA-Merlin/core.git) (4.0.0)
Building wheels for collected packages: merlin-core
Building wheel for merlin-core (pyproject.toml): started
Building wheel for merlin-core (pyproject.toml): finished with status 'done'
Created wheel for merlin-core: filename=merlin_core-0.3.0+5.g6df9aaa-py3-none-any.whl size=133336 sha256=efd0931c0f9eec0ec32fa714f41914293a5bd5a6ec839dd75ac2d08a80d1838a
Stored in directory: /tmp/pip-ephem-wheel-cache-re2o0hdv/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
nvtabular 1.0.0+10.g4df99eb4 requires merlin-core==0.2.0, but you have merlin-core 0.3.0+5.g6df9aaa which is incompatible.
Successfully installed merlin-core-0.3.0+5.g6df9aaa
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: natsort==8.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (8.1.0)
Requirement already satisfied: myst-nb=7.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-nb=0.15 in /usr/local/lib/python3.8/dist-packages (from myst-nb=3.1 in /usr/local/lib/python3.8/dist-packages (from myst-nb=5.6 in /usr/local/lib/python3.8/dist-packages (from myst-nb=7.1 in /usr/local/lib/python3.8/dist-packages (from sphinx-external-toc=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from ipywidgets=7.0.0->myst-nb=7.0.0->myst-nb=4.5.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=4.3.1 in /usr/local/lib/python3.8/dist-packages (from ipywidgets=7.0.0->myst-nb=7.0.0->myst-nbmyst-nbmyst-nbmyst-nb4.3 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=2.0.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=2.4.0 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nb=18.5 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nbmyst-nb=0.16 in /usr/local/lib/python3.8/dist-packages (from ipython->myst-nb=0.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nb=1.3.12 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-cache~=0.4.1->myst-nbmyst-nbmyst-nbmyst-nb=1.0.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from myst-parser~=0.15.2->myst-nb=2.0 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=0.2.2 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=4.7 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=0.8.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nb=5.6->myst-nb=1.4.1 in /usr/local/lib/python3.8/dist-packages (from nbconvert=5.6->myst-nb=5.6->myst-nbmyst-nb=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat~=5.0->myst-nb=3.1->myst-nb=1.1 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=2.5.0 in /usr/lib/python3/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=1.3 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=0.7 in /usr/local/lib/python3.8/dist-packages (from sphinx=3.1->myst-nb=3.1->myst-nb=3.1->myst-nb=3.1->myst-nbmyst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->myst-nb=2015.7 in /usr/local/lib/python3.8/dist-packages (from babel>=1.3->sphinx=3.1->myst-nb=6.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=6.1.12 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=1.0 in /usr/local/lib/python3.8/dist-packages (from ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=4.5.1->ipywidgets=7.0.0->myst-nb=0.8.0 in /usr/local/lib/python3.8/dist-packages (from jedi>=0.16->ipython->myst-nb=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=1.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat~=5.0->myst-nb=0.5 in /usr/local/lib/python3.8/dist-packages (from pexpect>4.3->ipython->myst-nb=2.0.0->ipython->myst-nb=1.3.12->jupyter-cache~=0.4.1->myst-nb=4.4.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from widgetsnbextension~=3.6.0->ipywidgets=7.0.0->myst-nb1.2 in /var/jenkins_home/.local/lib/python3.8/site-packages/soupsieve-2.2.1-py3.8.egg (from beautifulsoup4->nbconvert=5.6->myst-nb=1.9.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from bleach->nbconvert=5.6->myst-nbnbconvert=5.6->myst-nb=0.2.2 in /var/jenkins_home/.local/lib/python3.8/site-packages (from nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nbjupyter-cache~=0.4.1->myst-nb=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->nbconvert=5.6->myst-nbipython->myst-nbipython->myst-nbipython->myst-nb=4.0.1 in /usr/local/lib/python3.8/dist-packages (from GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nb=22.3 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nb=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets=7.0.0->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=3.1.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=0.8.3 in /var/jenkins_home/.local/lib/python3.8/site-packages (from jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nbnbdime->jupyter-cache~=0.4.1->myst-nb=1.1 in /var/jenkins_home/.local/lib/python3.8/site-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=2.8 in /usr/lib/python3/dist-packages (from anyio=3.1.0->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=3.0.1 in /usr/local/lib/python3.8/dist-packages (from gitdb=4.0.1->GitPython!=2.1.4,!=2.1.5,!=2.1.6->nbdime->jupyter-cache~=0.4.1->myst-nbjupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1 in /usr/local/lib/python3.8/dist-packages (from argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb=1.0.1->argon2-cffi-bindings->argon2-cffi->jupyter-server->nbdime->jupyter-cache~=0.4.1->myst-nb build/lib.linux-x86_64-cpython-38/tests
copying tests/__init__.py -> build/lib.linux-x86_64-cpython-38/tests
creating build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/io.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/_version.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/graph.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/dispatch.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/worker.py -> build/lib.linux-x86_64-cpython-38/nvtabular
copying nvtabular/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular
creating build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_triton_inference.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_dask_nvt.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tf4rec.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_s3.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/test_tools.py -> build/lib.linux-x86_64-cpython-38/tests/unit
copying tests/unit/__init__.py -> build/lib.linux-x86_64-cpython-38/tests/unit
creating build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/torch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/backend.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
copying nvtabular/loader/tf_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/loader
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference
copying nvtabular/inference/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
copying nvtabular/framework_utils/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils
creating build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/inspector_script.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/dataset_inspector.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/data_gen.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
copying nvtabular/tools/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/tools
creating build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/data_stats.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/stat_operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/clip.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/target_encoding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/add_metadata.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/logop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hashed_cross.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/categorify.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/rename.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/list_slice.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/hash_bucket.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/fill.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/dropna.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/lambdaop.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/value_counts.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/operator.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/normalize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/filter.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_external.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/join_groupby.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/moments.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/difference_lag.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/bucketize.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
copying nvtabular/ops/column_similarity.py -> build/lib.linux-x86_64-cpython-38/nvtabular/ops
creating build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/node.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/workflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
copying nvtabular/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/benchmarking_tools.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/model_config_pb2.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/workflow_model.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/ensemble.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/data_conversions.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
copying nvtabular/inference/triton/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/tensorflow.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/base.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/hugectr.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/pytorch.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
copying nvtabular/inference/workflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/workflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/model_pt.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
copying nvtabular/inference/triton/model/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/inference/triton/model
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/feature_column_utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/tfrecords_to_parquet.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
copying nvtabular/framework_utils/tensorflow/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/utils.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/models.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
copying nvtabular/framework_utils/torch/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/outer_product.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/embedding.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/interaction.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
copying nvtabular/framework_utils/tensorflow/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/tensorflow/layers
creating build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/embeddings.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
copying nvtabular/framework_utils/torch/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/nvtabular/framework_utils/torch/layers
/usr/local/lib/python3.8/dist-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
package init file 'ci/__init__.py' not found (or not a regular file)
package init file 'images/__init__.py' not found (or not a regular file)
package init file 'docs/__init__.py' not found (or not a regular file)
package init file 'cpp/__init__.py' not found (or not a regular file)
package init file 'bench/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench
copying bench/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/bench
package init file 'merlin/__init__.py' not found (or not a regular file)
package init file 'examples/__init__.py' not found (or not a regular file)
package init file 'conda/__init__.py' not found (or not a regular file)
package init file 'docs/source/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/docs
creating build/lib.linux-x86_64-cpython-38/docs/source
copying docs/source/conf.py -> build/lib.linux-x86_64-cpython-38/docs/source
package init file 'docs/source/_templates/__init__.py' not found (or not a regular file)
package init file 'docs/source/images/__init__.py' not found (or not a regular file)
package init file 'docs/source/training/__init__.py' not found (or not a regular file)
package init file 'docs/source/resources/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/__init__.py' not found (or not a regular file)
package init file 'cpp/nvtabular/inference/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets
copying bench/datasets/test_dataset.py -> build/lib.linux-x86_64-cpython-38/bench/datasets
package init file 'bench/torch/__init__.py' not found (or not a regular file)
package init file 'bench/examples/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dask-nvtabular-criteo-benchmark.py -> build/lib.linux-x86_64-cpython-38/bench/examples
copying bench/examples/dataloader_bench.py -> build/lib.linux-x86_64-cpython-38/bench/examples
package init file 'bench/datasets/configs/__init__.py' not found (or not a regular file)
package init file 'bench/datasets/tools/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_hugectr.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_pytorch.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/nvt_etl.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
copying bench/datasets/tools/train_tensorflow.py -> build/lib.linux-x86_64-cpython-38/bench/datasets/tools
package init file 'bench/torch/criteo/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/merlin
creating build/lib.linux-x86_64-cpython-38/merlin/transforms
copying merlin/transforms/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms
creating build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
copying merlin/transforms/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/merlin/transforms/ops
package init file 'examples/tensorflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples
creating build/lib.linux-x86_64-cpython-38/examples/tensorflow
copying examples/tensorflow/callbacks.py -> build/lib.linux-x86_64-cpython-38/examples/tensorflow
package init file 'examples/getting-started-movielens/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-toy-example/__init__.py' not found (or not a regular file)
package init file 'examples/tabular-data-rossmann/__init__.py' not found (or not a regular file)
package init file 'examples/advanced-ops-outbrain/__init__.py' not found (or not a regular file)
package init file 'examples/multi-gpu-movielens/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/tf_trainer.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
copying examples/multi-gpu-movielens/torch_trainer_dist.py -> build/lib.linux-x86_64-cpython-38/examples/multi-gpu-movielens
package init file 'examples/scaling-criteo/__init__.py' not found (or not a regular file)
package init file 'examples/winning-solution-recsys2020-twitter/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/docker/__init__.py' not found (or not a regular file)
package init file 'examples/tensorflow/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/getting-started-movielens/imgs/__init__.py' not found (or not a regular file)
package init file 'examples/scaling-criteo/imgs/__init__.py' not found (or not a regular file)
package init file 'tests/integration/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_tf_inference.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_inf_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_notebooks.py -> build/lib.linux-x86_64-cpython-38/tests/integration
copying tests/integration/test_nvt_hugectr.py -> build/lib.linux-x86_64-cpython-38/tests/integration
package init file 'tests/integration/common/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common
copying tests/integration/common/utils.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common
package init file 'tests/integration/common/parsers/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/benchmark_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/rossmann_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
copying tests/integration/common/parsers/criteo_parsers.py -> build/lib.linux-x86_64-cpython-38/tests/integration/common/parsers
package init file 'tests/unit/loader/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_dataloader_backend.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_tf_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
copying tests/unit/loader/test_torch_dataloader.py -> build/lib.linux-x86_64-cpython-38/tests/unit/loader
package init file 'tests/unit/framework_utils/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_feature_columns.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_tf_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
copying tests/unit/framework_utils/test_torch_layers.py -> build/lib.linux-x86_64-cpython-38/tests/unit/framework_utils
package init file 'tests/unit/ops/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_reduce_dtype_size.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_fill.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_lambda.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_categorify.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops_schema.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_drop_low_cardinality.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_target_encode.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_groupyby.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_column_similarity.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_join.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_normalize.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_hash_bucket.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
copying tests/unit/ops/test_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/ops
package init file 'tests/unit/workflow/__init__.py' not found (or not a regular file)
creating build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_ops.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_cpu_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_schemas.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_node.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
copying tests/unit/workflow/test_workflow_chaining.py -> build/lib.linux-x86_64-cpython-38/tests/unit/workflow
package init file 'conda/environments/__init__.py' not found (or not a regular file)
package init file 'conda/recipes/__init__.py' not found (or not a regular file)
running egg_info
creating nvtabular.egg-info
writing nvtabular.egg-info/PKG-INFO
writing dependency_links to nvtabular.egg-info/dependency_links.txt
writing requirements to nvtabular.egg-info/requires.txt
writing top-level names to nvtabular.egg-info/top_level.txt
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.h' under directory 'cpp'
warning: no files found matching '*.cu' under directory 'cpp'
warning: no files found matching '*.cuh' under directory 'cpp'
adding license file 'LICENSE'
writing manifest file 'nvtabular.egg-info/SOURCES.txt'
running build_ext
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/include/python3.8 -c flagcheck.cpp -o flagcheck.o -std=c++17
building 'nvtabular_cpp' extension
creating build/temp.linux-x86_64-cpython-38
creating build/temp.linux-x86_64-cpython-38/cpp
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular
creating build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+11.gfaf9a2aba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+11.gfaf9a2aba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/__init__.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+11.gfaf9a2aba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/categorify.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DVERSION_INFO=1.1.1+11.gfaf9a2aba -I./cpp/ -I/var/jenkins_home/.local/lib/python3.8/site-packages/pybind11/include -I/usr/include/python3.8 -c cpp/nvtabular/inference/fill.cc -o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -std=c++17 -fvisibility=hidden -g0
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 build/temp.linux-x86_64-cpython-38/cpp/nvtabular/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/__init__.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/categorify.o build/temp.linux-x86_64-cpython-38/cpp/nvtabular/inference/fill.o -L/usr/lib -o build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-cpython-38/nvtabular_cpp.cpython-38-x86_64-linux-gnu.so ->
Creating /var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular.egg-link (link to .)
nvtabular 1.1.1+11.gfaf9a2aba is already the active version in easy-install.pth
Installed /var/jenkins_home/workspace/nvtabular_tests/nvtabular
Running black --check
All done! ✨ 🍰 ✨
131 files would be left unchanged.
Running flake8
Running isort
Skipped 2 files
Running bandit
Running pylint
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Running flake8-nb
Building docs
make: Entering directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
/usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
INFO:sphinxcontrib.copydirs.copydirs:Copying source documentation from: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/examples
INFO:sphinxcontrib.copydirs.copydirs: ...to destination: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/source/examples
INFO:traitlets:Writing 14816 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain
INFO:traitlets:Writing 35171 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 19347 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/advanced-ops-outbrain/03-Training-with-TF.ipynb
INFO:traitlets:Writing 14170 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Writing 34457 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 28932 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-HugeCTR.ipynb
INFO:traitlets:Writing 20504 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-PyTorch.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens
INFO:traitlets:Writing 61676 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/03-Training-with-TF.ipynb
INFO:traitlets:Writing 18521 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-HugeCTR.ipynb
INFO:traitlets:Writing 21842 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/getting-started-movielens/04-Triton-Inference-with-TF.ipynb
INFO:traitlets:Writing 43655 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-movielens/01-03-MultiGPU-Download-Convert-ETL-with-NVTabular-Training-with-TensorFlow.ipynb
INFO:traitlets:Writing 44549 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/multi-gpu-toy-example/multi-gpu_dask.ipynb
INFO:traitlets:Writing 9604 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/01-Download-Convert.ipynb
INFO:traitlets:Writing 21552 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 12041 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-FastAI.ipynb
INFO:traitlets:Writing 20792 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-HugeCTR.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo
INFO:traitlets:Writing 203961 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/03-Training-with-TF.ipynb
INFO:traitlets:Writing 32956 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-HugeCTR.ipynb
INFO:traitlets:Writing 25153 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/scaling-criteo/04-Triton-Inference-with-TF.ipynb
INFO:traitlets:Writing 23938 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/01-Download-Convert.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann
INFO:traitlets:Writing 33764 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/02-ETL-with-NVTabular.ipynb
INFO:traitlets:Writing 19635 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-FastAI.ipynb
INFO:traitlets:Writing 17586 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-PyTorch.ipynb
INFO:traitlets:Writing 21354 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/tabular-data-rossmann/03-Training-with-TF.ipynb
INFO:traitlets:Support files will be in
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter
INFO:traitlets:Making directory /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter
INFO:traitlets:Writing 77074 bytes to /var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs/build/jupyter_execute/examples/winning-solution-recsys2020-twitter/01-02-04-Download-Convert-ETL-with-NVTabular-Training-with-XGBoost.ipynb
make: Leaving directory '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/docs'
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1420 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ............... [ 46%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ........Build timed out (after 60 minutes). Marking the build as failed.
.Terminated
Build was aborted
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins7990170128639053624.sh
Click to view CI Results
GitHub pull request #1547 of commit c2a5b743c7a0b458be7af4ca96da091887a044b9, no merge conflicts.
Running as SYSTEM
Setting status of c2a5b743c7a0b458be7af4ca96da091887a044b9 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4614/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
> git rev-parse c2a5b743c7a0b458be7af4ca96da091887a044b9^{commit} # timeout=10
Checking out Revision c2a5b743c7a0b458be7af4ca96da091887a044b9 (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f c2a5b743c7a0b458be7af4ca96da091887a044b9 # timeout=10
Commit message: "Merge branch 'main' into main"
> git rev-list --no-walk 242fc3657c847d7ed026dc657dc5a331c73ca015 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins14470040310932446478.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1432 items
tests/unit/test_dask_nvt.py ..........................F..F.............F [ 3%]
F.FF..............................................................FFF... [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_s3.py FF [ 8%]
tests/unit/test_tf4rec.py . [ 9%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 21%]
........................................s.. [ 24%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 26%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py FFFFFF [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]
=================================== FAILURES ===================================
____ test_dask_workflow_api_dlrm[True-None-True-device-0-csv-no-header-0.1] ____
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr26')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
freq_threshold = 0, part_mem_fraction = 0.1, engine = 'csv-no-header'
cat_cache = 'device', on_host = True, shuffle = None, cpu = True
@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
client,
tmpdir,
datasets,
freq_threshold,
part_mem_fraction,
engine,
cat_cache,
on_host,
shuffle,
cpu,
):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
paths = sorted(paths)
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
df0 = df0.to_pandas() if cpu else df0
if engine == "parquet":
cat_names = ["name-cat", "name-string"]
else:
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
cats = cat_names >> ops.Categorify(
freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
)
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()
workflow = Workflow(cats + conts + label_name)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
else:
dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)
output_path = os.path.join(tmpdir, "processed")
transformed = workflow.fit_transform(dataset)
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)
result = transformed.to_ddf().compute()
assert len(df0) == len(result)
assert result["x"].min() == 0.0
assert result["x"].isna().sum() == 0
assert result["y"].min() == 0.0
assert result["y"].isna().sum() == 0
# Check categories. Need to sort first to make sure we are comparing
# "apples to apples"
expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
{"name-string_x": "count", "name-string_y": "count"}
)
if freq_threshold:
dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)
# Read back from disk
if cpu:
df_disk = dd_read_parquet(output_path).compute()
tests/unit/test_dask_nvt.py:130:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
2022-08-09 08:05:58,808 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-59cbff4bfa9b201755371def3a4a8ee0', 1)
Function: subgraph_callable-bc40cc8d-e6cc-44be-b032-79fc6a78
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr26/processed/part_1.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
___ test_dask_workflow_api_dlrm[True-None-True-device-150-csv-no-header-0.1] ___
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr29')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
freq_threshold = 150, part_mem_fraction = 0.1, engine = 'csv-no-header'
cat_cache = 'device', on_host = True, shuffle = None, cpu = True
@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
client,
tmpdir,
datasets,
freq_threshold,
part_mem_fraction,
engine,
cat_cache,
on_host,
shuffle,
cpu,
):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
paths = sorted(paths)
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
df0 = df0.to_pandas() if cpu else df0
if engine == "parquet":
cat_names = ["name-cat", "name-string"]
else:
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
cats = cat_names >> ops.Categorify(
freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
)
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()
workflow = Workflow(cats + conts + label_name)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
else:
dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)
output_path = os.path.join(tmpdir, "processed")
transformed = workflow.fit_transform(dataset)
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)
result = transformed.to_ddf().compute()
assert len(df0) == len(result)
assert result["x"].min() == 0.0
assert result["x"].isna().sum() == 0
assert result["y"].min() == 0.0
assert result["y"].isna().sum() == 0
# Check categories. Need to sort first to make sure we are comparing
# "apples to apples"
expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
{"name-string_x": "count", "name-string_y": "count"}
)
if freq_threshold:
dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)
# Read back from disk
if cpu:
df_disk = dd_read_parquet(output_path).compute()
tests/unit/test_dask_nvt.py:130:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
2022-08-09 08:06:00,845 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-49b3604576a4cafddded7a7f39db1cf7', 1)
Function: subgraph_callable-0a7fa952-c38a-4751-be47-bec1bd5c
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr29/processed/part_1.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
_________ test_dask_workflow_api_dlrm[True-None-False-None-0-csv-0.1] __________
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr43')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
freq_threshold = 0, part_mem_fraction = 0.1, engine = 'csv', cat_cache = None
on_host = False, shuffle = None, cpu = True
@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
client,
tmpdir,
datasets,
freq_threshold,
part_mem_fraction,
engine,
cat_cache,
on_host,
shuffle,
cpu,
):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
paths = sorted(paths)
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
df0 = df0.to_pandas() if cpu else df0
if engine == "parquet":
cat_names = ["name-cat", "name-string"]
else:
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
cats = cat_names >> ops.Categorify(
freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
)
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()
workflow = Workflow(cats + conts + label_name)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
else:
dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)
output_path = os.path.join(tmpdir, "processed")
transformed = workflow.fit_transform(dataset)
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)
result = transformed.to_ddf().compute()
assert len(df0) == len(result)
assert result["x"].min() == 0.0
assert result["x"].isna().sum() == 0
assert result["y"].min() == 0.0
assert result["y"].isna().sum() == 0
# Check categories. Need to sort first to make sure we are comparing
# "apples to apples"
expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
{"name-string_x": "count", "name-string_y": "count"}
)
if freq_threshold:
dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)
# Read back from disk
if cpu:
df_disk = dd_read_parquet(output_path).compute()
tests/unit/test_dask_nvt.py:130:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
2022-08-09 08:06:08,690 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-10ebdc6558d21ad31bcfda98f39ea235', 0)
Function: subgraph_callable-876b006a-b0d8-491d-8a31-2856bd56
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr43/processed/part_0.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
____ test_dask_workflow_api_dlrm[True-None-False-None-0-csv-no-header-0.1] _____
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr44')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
freq_threshold = 0, part_mem_fraction = 0.1, engine = 'csv-no-header'
cat_cache = None, on_host = False, shuffle = None, cpu = True
@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
client,
tmpdir,
datasets,
freq_threshold,
part_mem_fraction,
engine,
cat_cache,
on_host,
shuffle,
cpu,
):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
paths = sorted(paths)
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
df0 = df0.to_pandas() if cpu else df0
if engine == "parquet":
cat_names = ["name-cat", "name-string"]
else:
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
cats = cat_names >> ops.Categorify(
freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
)
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()
workflow = Workflow(cats + conts + label_name)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
else:
dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)
output_path = os.path.join(tmpdir, "processed")
transformed = workflow.fit_transform(dataset)
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)
result = transformed.to_ddf().compute()
assert len(df0) == len(result)
assert result["x"].min() == 0.0
assert result["x"].isna().sum() == 0
assert result["y"].min() == 0.0
assert result["y"].isna().sum() == 0
# Check categories. Need to sort first to make sure we are comparing
# "apples to apples"
expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
{"name-string_x": "count", "name-string_y": "count"}
)
if freq_threshold:
dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)
# Read back from disk
if cpu:
df_disk = dd_read_parquet(output_path).compute()
tests/unit/test_dask_nvt.py:130:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
2022-08-09 08:06:09,657 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-09ccef034e94de817977745e9bd95565', 1)
Function: subgraph_callable-e0817284-0859-4f0b-b29f-6dbe624e
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr44/processed/part_1.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
________ test_dask_workflow_api_dlrm[True-None-False-None-150-csv-0.1] _________
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr46')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
freq_threshold = 150, part_mem_fraction = 0.1, engine = 'csv', cat_cache = None
on_host = False, shuffle = None, cpu = True
@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
client,
tmpdir,
datasets,
freq_threshold,
part_mem_fraction,
engine,
cat_cache,
on_host,
shuffle,
cpu,
):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
paths = sorted(paths)
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
df0 = df0.to_pandas() if cpu else df0
if engine == "parquet":
cat_names = ["name-cat", "name-string"]
else:
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
cats = cat_names >> ops.Categorify(
freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
)
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()
workflow = Workflow(cats + conts + label_name)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
else:
dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)
output_path = os.path.join(tmpdir, "processed")
transformed = workflow.fit_transform(dataset)
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)
result = transformed.to_ddf().compute()
assert len(df0) == len(result)
assert result["x"].min() == 0.0
assert result["x"].isna().sum() == 0
assert result["y"].min() == 0.0
assert result["y"].isna().sum() == 0
# Check categories. Need to sort first to make sure we are comparing
# "apples to apples"
expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
{"name-string_x": "count", "name-string_y": "count"}
)
if freq_threshold:
dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)
# Read back from disk
if cpu:
df_disk = dd_read_parquet(output_path).compute()
tests/unit/test_dask_nvt.py:130:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
2022-08-09 08:06:11,000 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-9e44775b0c668fe5556e30bbf3a5a58b', 0)
Function: subgraph_callable-726c31d6-5b0f-4fd2-b62f-1be20712
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr46/processed/part_0.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
___ test_dask_workflow_api_dlrm[True-None-False-None-150-csv-no-header-0.1] ____
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr47')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
freq_threshold = 150, part_mem_fraction = 0.1, engine = 'csv-no-header'
cat_cache = None, on_host = False, shuffle = None, cpu = True
@pytest.mark.parametrize("part_mem_fraction", [0.1])
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("freq_threshold", [0, 150])
@pytest.mark.parametrize("cat_cache", ["device", None])
@pytest.mark.parametrize("on_host", [True, False])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [True, False])
def test_dask_workflow_api_dlrm(
client,
tmpdir,
datasets,
freq_threshold,
part_mem_fraction,
engine,
cat_cache,
on_host,
shuffle,
cpu,
):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
paths = sorted(paths)
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
df0 = df0.to_pandas() if cpu else df0
if engine == "parquet":
cat_names = ["name-cat", "name-string"]
else:
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
cats = cat_names >> ops.Categorify(
freq_threshold=freq_threshold, out_path=str(tmpdir), cat_cache=cat_cache, on_host=on_host
)
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> ops.LogOp()
workflow = Workflow(cats + conts + label_name)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, cpu=cpu, part_mem_fraction=part_mem_fraction)
else:
dataset = Dataset(paths, cpu=cpu, names=allcols_csv, part_mem_fraction=part_mem_fraction)
output_path = os.path.join(tmpdir, "processed")
transformed = workflow.fit_transform(dataset)
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=1)
result = transformed.to_ddf().compute()
assert len(df0) == len(result)
assert result["x"].min() == 0.0
assert result["x"].isna().sum() == 0
assert result["y"].min() == 0.0
assert result["y"].isna().sum() == 0
# Check categories. Need to sort first to make sure we are comparing
# "apples to apples"
expect = df0.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
got = result.sort_values(["label", "x", "y", "id"]).reset_index(drop=True).reset_index()
dfm = expect.merge(got, on="index", how="inner")[["name-string_x", "name-string_y"]]
dfm_gb = dfm.groupby(["name-string_x", "name-string_y"]).agg(
{"name-string_x": "count", "name-string_y": "count"}
)
if freq_threshold:
dfm_gb = dfm_gb[dfm_gb["name-string_x"] >= freq_threshold]
assert_eq(dfm_gb["name-string_x"], dfm_gb["name-string_y"], check_names=False)
# Read back from disk
if cpu:
df_disk = dd_read_parquet(output_path).compute()
tests/unit/test_dask_nvt.py:130:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
2022-08-09 08:06:11,924 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-f624361c9960e8bfe9f17d1c64ec291a', 1)
Function: subgraph_callable-a4dff5b3-44cd-4cd0-abd8-1d33decd
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_workflow_api_dlrm_Tr47/processed/part_1.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
___________________ test_dask_preproc_cpu[True-None-parquet] ___________________
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
engine = 'parquet', shuffle = None, cpu = True
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_preproc_cpu(client, tmpdir, datasets, engine, shuffle, cpu):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, part_size="1MB", cpu=cpu)
else:
dataset = Dataset(paths, names=allcols_csv, part_size="1MB", cpu=cpu)
# Simple transform (normalize)
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
conts = cont_names >> ops.FillMissing() >> ops.Normalize()
workflow = Workflow(conts + cat_names + label_name)
transformed = workflow.fit_transform(dataset)
# Write out dataset
output_path = os.path.join(tmpdir, "processed")
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=4)
# Check the final result
df_disk = dd_read_parquet(output_path, engine="pyarrow").compute()
tests/unit/test_dask_nvt.py:277:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
2022-08-09 08:06:52,615 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 0)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_0.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
2022-08-09 08:06:52,617 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 11)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_2.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,617 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 1)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_0.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,620 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 10)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_2.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,626 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 14)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_3.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,627 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 12)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_3.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,627 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 15)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_3.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,630 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 13)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_3.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
--------------------------- Captured stderr teardown ---------------------------
2022-08-09 08:06:52,633 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 2)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_0.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,671 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 3)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_0.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,674 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 4)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_1.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,675 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 5)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_1.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,684 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 7)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_1.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,693 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 6)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_1.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,694 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 8)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_2.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:52,694 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-987cf9d54c23fd3d63f87138d33d5925', 9)
Function: subgraph_callable-5187a416-b333-4d5a-bd7e-7ede7a13
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non0/processed/part_2.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
_____________________ test_dask_preproc_cpu[True-None-csv] _____________________
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
engine = 'csv', shuffle = None, cpu = True
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_preproc_cpu(client, tmpdir, datasets, engine, shuffle, cpu):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, part_size="1MB", cpu=cpu)
else:
dataset = Dataset(paths, names=allcols_csv, part_size="1MB", cpu=cpu)
# Simple transform (normalize)
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
conts = cont_names >> ops.FillMissing() >> ops.Normalize()
workflow = Workflow(conts + cat_names + label_name)
transformed = workflow.fit_transform(dataset)
# Write out dataset
output_path = os.path.join(tmpdir, "processed")
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=4)
# Check the final result
df_disk = dd_read_parquet(output_path, engine="pyarrow").compute()
tests/unit/test_dask_nvt.py:277:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
2022-08-09 08:06:53,332 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 20)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_5.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,333 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 13)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_3.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,337 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 19)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_4.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,339 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 16)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_4.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,340 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 14)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_3.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
--------------------------- Captured stderr teardown ---------------------------
2022-08-09 08:06:53,343 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 15)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_3.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,343 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 22)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_5.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,345 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 11)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_2.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,346 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 18)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_4.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,351 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 12)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_3.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,352 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 17)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_4.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,354 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 10)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_2.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,357 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 21)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_5.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,362 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 26)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_6.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,365 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 24)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_6.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,373 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 28)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_7.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,374 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 31)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_7.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,375 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 23)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_5.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,376 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 25)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_6.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,382 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 27)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_6.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,388 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 29)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_7.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:53,388 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-c93258fabc7094400b097695615335f6', 30)
Function: subgraph_callable-08891116-a016-4024-92a7-bf0241b9
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non1/processed/part_7.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
________________ test_dask_preproc_cpu[True-None-csv-no-header] ________________
client = <Client: 'tcp://127.0.0.1:36589' processes=2 threads=16, memory=125.83 GiB>
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2')
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
engine = 'csv-no-header', shuffle = None, cpu = True
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("shuffle", [Shuffle.PER_WORKER, None])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_preproc_cpu(client, tmpdir, datasets, engine, shuffle, cpu):
set_dask_client(client=client)
paths = glob.glob(str(datasets[engine]) + "/*." + engine.split("-")[0])
if engine == "parquet":
df1 = cudf.read_parquet(paths[0])[mycols_pq]
df2 = cudf.read_parquet(paths[1])[mycols_pq]
elif engine == "csv":
df1 = cudf.read_csv(paths[0], header=0)[mycols_csv]
df2 = cudf.read_csv(paths[1], header=0)[mycols_csv]
else:
df1 = cudf.read_csv(paths[0], names=allcols_csv)[mycols_csv]
df2 = cudf.read_csv(paths[1], names=allcols_csv)[mycols_csv]
df0 = cudf.concat([df1, df2], axis=0)
if engine in ("parquet", "csv"):
dataset = Dataset(paths, part_size="1MB", cpu=cpu)
else:
dataset = Dataset(paths, names=allcols_csv, part_size="1MB", cpu=cpu)
# Simple transform (normalize)
cat_names = ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
conts = cont_names >> ops.FillMissing() >> ops.Normalize()
workflow = Workflow(conts + cat_names + label_name)
transformed = workflow.fit_transform(dataset)
# Write out dataset
output_path = os.path.join(tmpdir, "processed")
transformed.to_parquet(output_path=output_path, shuffle=shuffle, out_files_per_proc=4)
# Check the final result
df_disk = dd_read_parquet(output_path, engine="pyarrow").compute()
tests/unit/test_dask_nvt.py:277:
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:3015: in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2167: in gather
return self.sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:309: in sync
return sync(
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:376: in sync
raise exc.with_traceback(tb)
/usr/local/lib/python3.8/dist-packages/distributed/utils.py:349: in f
result = yield future
/usr/local/lib/python3.8/dist-packages/tornado/gen.py:762: in run
value = future.result()
/usr/local/lib/python3.8/dist-packages/distributed/client.py:2030: in _gather
raise exception.with_traceback(traceback)
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(*(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:466: in read_partition
arrow_table = cls._read_table(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:1606: in _read_table
arrow_table = _read_table_from_path(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/arrow.py:277: in _read_table_from_path
return pq.ParquetFile(fil).read_row_groups(
/usr/local/lib/python3.8/dist-packages/pyarrow/parquet.py:230: in init
self.reader.open(
pyarrow/_parquet.pyx:972: in pyarrow._parquet.ParquetReader.open
???
???
E pyarrow.lib.ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
pyarrow/error.pxi:99: ArrowInvalid
----------------------------- Captured stderr call -----------------------------
2022-08-09 08:06:54,051 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-d37d6baf70f240c2c93439272f3e810b', 17)
Function: subgraph_callable-bde5a890-3c1f-4199-88c5-55e55f29
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2/processed/part_4.parquet', [1], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:54,052 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-d37d6baf70f240c2c93439272f3e810b', 19)
Function: subgraph_callable-bde5a890-3c1f-4199-88c5-55e55f29
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2/processed/part_4.parquet', [3], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:54,052 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-d37d6baf70f240c2c93439272f3e810b', 20)
Function: subgraph_callable-bde5a890-3c1f-4199-88c5-55e55f29
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2/processed/part_5.parquet', [0], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
2022-08-09 08:06:54,053 - distributed.worker - WARNING - Compute Failed
Key: ('read-parquet-d37d6baf70f240c2c93439272f3e810b', 22)
Function: subgraph_callable-bde5a890-3c1f-4199-88c5-55e55f29
args: ({'piece': ('/tmp/pytest-of-jenkins/pytest-14/test_dask_preproc_cpu_True_Non2/processed/part_5.parquet', [2], [])})
kwargs: {}
Exception: "ArrowInvalid('Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.')"
___________________________ test_s3_dataset[parquet] ___________________________
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
def _new_conn(self):
""" Establish a socket connection and set nodelay settings on it.
:return: New socket connection.
"""
extra_kw = {}
if self.source_address:
extra_kw["source_address"] = self.source_address
if self.socket_options:
extra_kw["socket_options"] = self.socket_options
try:
conn = connection.create_connection(
(self._dns_host, self.port), self.timeout, **extra_kw
)
/usr/lib/python3/dist-packages/urllib3/connection.py:159:
address = ('127.0.0.1', 5000), timeout = 60, source_address = None
socket_options = [(6, 1, 1)]
def create_connection(
address,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None,
socket_options=None,
):
"""Connect to *address* and return the socket object.
Convenience function. Connect to *address* (a 2-tuple ``(host,
port)``) and return the socket object. Passing the optional
*timeout* parameter will set the timeout on the socket instance
before attempting to connect. If no *timeout* is supplied, the
global default timeout setting returned by :func:`getdefaulttimeout`
is used. If *source_address* is set it must be a tuple of (host, port)
for the socket to bind as a source address before making the connection.
An host of '' or port 0 tells the OS to use the default.
"""
host, port = address
if host.startswith("["):
host = host.strip("[]")
err = None
# Using the value from allowed_gai_family() in the context of getaddrinfo lets
# us select whether to work with IPv4 DNS records, IPv6 records, or both.
# The original create_connection function always returns all records.
family = allowed_gai_family()
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
sock = socket.socket(af, socktype, proto)
# If provided, set socket level options before connecting.
_set_socket_options(sock, socket_options)
if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(sa)
return sock
except socket.error as e:
err = e
if sock is not None:
sock.close()
sock = None
if err is not None:
raise err
/usr/lib/python3/dist-packages/urllib3/util/connection.py:84:
address = ('127.0.0.1', 5000), timeout = 60, source_address = None
socket_options = [(6, 1, 1)]
def create_connection(
address,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None,
socket_options=None,
):
"""Connect to *address* and return the socket object.
Convenience function. Connect to *address* (a 2-tuple ``(host,
port)``) and return the socket object. Passing the optional
*timeout* parameter will set the timeout on the socket instance
before attempting to connect. If no *timeout* is supplied, the
global default timeout setting returned by :func:`getdefaulttimeout`
is used. If *source_address* is set it must be a tuple of (host, port)
for the socket to bind as a source address before making the connection.
An host of '' or port 0 tells the OS to use the default.
"""
host, port = address
if host.startswith("["):
host = host.strip("[]")
err = None
# Using the value from allowed_gai_family() in the context of getaddrinfo lets
# us select whether to work with IPv4 DNS records, IPv6 records, or both.
# The original create_connection function always returns all records.
family = allowed_gai_family()
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
sock = socket.socket(af, socktype, proto)
# If provided, set socket level options before connecting.
_set_socket_options(sock, socket_options)
if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(sa)
E ConnectionRefusedError: [Errno 111] Connection refused
/usr/lib/python3/dist-packages/urllib3/util/connection.py:74: ConnectionRefusedError
During handling of the above exception, another exception occurred:
self = <botocore.httpsession.URLLib3Session object at 0x7fbad6651cd0>
request = <AWSPreparedRequest stream_output=False, method=PUT, url=http://127.0.0.1:5000/parquet, headers={'x-amz-acl': b'public...nvocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}>
def send(self, request):
try:
proxy_url = self._proxy_config.proxy_url_for(request.url)
manager = self._get_connection_manager(request.url, proxy_url)
conn = manager.connection_from_url(request.url)
self._setup_ssl_cert(conn, request.url, self._verify)
if ensure_boolean(
os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', '')
):
# This is currently an "experimental" feature which provides
# no guarantees of backwards compatibility. It may be subject
# to change or removal in any patch version. Anyone opting in
# to this feature should strictly pin botocore.
host = urlparse(request.url).hostname
conn.proxy_headers['host'] = host
request_target = self._get_request_target(request.url, proxy_url)
urllib_response = conn.urlopen(
method=request.method,
url=request_target,
body=request.body,
headers=request.headers,
retries=Retry(False),
assert_same_host=False,
preload_content=False,
decode_content=False,
chunked=self._chunked(request.headers),
)
/usr/local/lib/python3.8/dist-packages/botocore/httpsession.py:448:
self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0e7705e0>
method = 'PUT', url = '/parquet', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
retries = Retry(total=False, connect=None, read=None, redirect=0, status=None)
redirect = True, assert_same_host = False
timeout = <object object at 0x7fbbe1452220>, pool_timeout = None
release_conn = False, chunked = False, body_pos = None
response_kw = {'decode_content': False, 'preload_content': False}, conn = None
release_this_conn = True, err = None, clean_exit = False
timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbb0e6d53d0>
is_new_proxy_conn = False
def urlopen(
self,
method,
url,
body=None,
headers=None,
retries=None,
redirect=True,
assert_same_host=True,
timeout=_Default,
pool_timeout=None,
release_conn=None,
chunked=False,
body_pos=None,
**response_kw
):
"""
Get a connection from the pool and perform an HTTP request. This is the
lowest level call for making a request, so you'll need to specify all
the raw details.
.. note::
More commonly, it's appropriate to use a convenience method provided
by :class:`.RequestMethods`, such as :meth:`request`.
.. note::
`release_conn` will only behave as expected if
`preload_content=False` because we want to make
`preload_content=False` the default behaviour someday soon without
breaking backwards compatibility.
:param method:
HTTP request method (such as GET, POST, PUT, etc.)
:param body:
Data to send in the request body (useful for creating
POST requests, see HTTPConnectionPool.post_url for
more convenience).
:param headers:
Dictionary of custom headers to send, such as User-Agent,
If-None-Match, etc. If None, pool headers are used. If provided,
these headers completely replace any pool-specific headers.
:param retries:
Configure the number of retries to allow before raising a
:class:`~urllib3.exceptions.MaxRetryError` exception.
Pass ``None`` to retry until you receive a response. Pass a
:class:`~urllib3.util.retry.Retry` object for fine-grained control
over different types of retries.
Pass an integer number to retry connection errors that many times,
but no other types of errors. Pass zero to never retry.
If ``False``, then retries are disabled and any exception is raised
immediately. Also, instead of raising a MaxRetryError on redirects,
the redirect response will be returned.
:type retries: :class:`~urllib3.util.retry.Retry`, False, or an int.
:param redirect:
If True, automatically handle redirects (status codes 301, 302,
303, 307, 308). Each redirect counts as a retry. Disabling retries
will disable redirect, too.
:param assert_same_host:
If ``True``, will make sure that the host of the pool requests is
consistent else will raise HostChangedError. When False, you can
use the pool on an HTTP proxy and request foreign hosts.
:param timeout:
If specified, overrides the default timeout for this one
request. It may be a float (in seconds) or an instance of
:class:`urllib3.util.Timeout`.
:param pool_timeout:
If set and the pool is set to block=True, then this method will
block for ``pool_timeout`` seconds and raise EmptyPoolError if no
connection is available within the time period.
:param release_conn:
If False, then the urlopen call will not release the connection
back into the pool once a response is received (but will release if
you read the entire contents of the response such as when
`preload_content=True`). This is useful if you're not preloading
the response's content immediately. You will need to call
``r.release_conn()`` on the response ``r`` to return the connection
back into the pool. If None, it takes the value of
``response_kw.get('preload_content', True)``.
:param chunked:
If True, urllib3 will send the body using chunked transfer
encoding. Otherwise, urllib3 will send the body using the standard
content-length form. Defaults to False.
:param int body_pos:
Position to seek to in file-like body in the event of a retry or
redirect. Typically this won't need to be set because urllib3 will
auto-populate the value when needed.
:param \\**response_kw:
Additional parameters are passed to
:meth:`urllib3.response.HTTPResponse.from_httplib`
"""
if headers is None:
headers = self.headers
if not isinstance(retries, Retry):
retries = Retry.from_int(retries, redirect=redirect, default=self.retries)
if release_conn is None:
release_conn = response_kw.get("preload_content", True)
# Check host
if assert_same_host and not self.is_same_host(url):
raise HostChangedError(self, url, retries)
# Ensure that the URL we're connecting to is properly encoded
if url.startswith("/"):
url = six.ensure_str(_encode_target(url))
else:
url = six.ensure_str(parse_url(url).url)
conn = None
# Track whether `conn` needs to be released before
# returning/raising/recursing. Update this variable if necessary, and
# leave `release_conn` constant throughout the function. That way, if
# the function recurses, the original value of `release_conn` will be
# passed down into the recursive call, and its value will be respected.
#
# See issue #651 [1] for details.
#
# [1] <https://github.com/urllib3/urllib3/issues/651>
release_this_conn = release_conn
# Merge the proxy headers. Only do this in HTTP. We have to copy the
# headers dict so we can safely change it without those changes being
# reflected in anyone else's copy.
if self.scheme == "http":
headers = headers.copy()
headers.update(self.proxy_headers)
# Must keep the exception bound to a separate variable or else Python 3
# complains about UnboundLocalError.
err = None
# Keep track of whether we cleanly exited the except block. This
# ensures we do proper cleanup in finally.
clean_exit = False
# Rewind body position, if needed. Record current position
# for future rewinds in the event of a redirect/retry.
body_pos = set_file_position(body, body_pos)
try:
# Request a connection from the queue.
timeout_obj = self._get_timeout(timeout)
conn = self._get_conn(timeout=pool_timeout)
conn.timeout = timeout_obj.connect_timeout
is_new_proxy_conn = self.proxy is not None and not getattr(
conn, "sock", None
)
if is_new_proxy_conn:
self._prepare_proxy(conn)
# Make the request on the httplib connection object.
httplib_response = self._make_request(
conn,
method,
url,
timeout=timeout_obj,
body=body,
headers=headers,
chunked=chunked,
)
# If we're going to release the connection in ``finally:``, then
# the response doesn't need to know about the connection. Otherwise
# it will also try to release it and we'll have a double-release
# mess.
response_conn = conn if not release_conn else None
# Pass method to Response for length checking
response_kw["request_method"] = method
# Import httplib's response into our own wrapper object
response = self.ResponseCls.from_httplib(
httplib_response,
pool=self,
connection=response_conn,
retries=retries,
**response_kw
)
# Everything went great!
clean_exit = True
except queue.Empty:
# Timed out by queue.
raise EmptyPoolError(self, "No pool connections are available.")
except (
TimeoutError,
HTTPException,
SocketError,
ProtocolError,
BaseSSLError,
SSLError,
CertificateError,
) as e:
# Discard the connection for these exceptions. It will be
# replaced during the next _get_conn() call.
clean_exit = False
if isinstance(e, (BaseSSLError, CertificateError)):
e = SSLError(e)
elif isinstance(e, (SocketError, NewConnectionError)) and self.proxy:
e = ProxyError("Cannot connect to proxy.", e)
elif isinstance(e, (SocketError, HTTPException)):
e = ProtocolError("Connection aborted.", e)
retries = retries.increment(
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
)
/usr/lib/python3/dist-packages/urllib3/connectionpool.py:719:
self = Retry(total=False, connect=None, read=None, redirect=0, status=None)
method = 'PUT', url = '/parquet', response = None
error = NewConnectionError('<botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>: Failed to establish a new connection: [Errno 111] Connection refused')
_pool = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0e7705e0>
_stacktrace = <traceback object at 0x7fbad6d6e840>
def increment(
self,
method=None,
url=None,
response=None,
error=None,
_pool=None,
_stacktrace=None,
):
""" Return a new Retry object with incremented retry counters.
:param response: A response object, or None, if the server did not
return a response.
:type response: :class:`~urllib3.response.HTTPResponse`
:param Exception error: An error encountered during the request, or
None if the response was received successfully.
:return: A new ``Retry`` object.
"""
if self.total is False and error:
# Disabled, indicate to re-raise the error.
raise six.reraise(type(error), error, _stacktrace)
/usr/lib/python3/dist-packages/urllib3/util/retry.py:376:
tp = <class 'urllib3.exceptions.NewConnectionError'>, value = None, tb = None
def reraise(tp, value, tb=None):
try:
if value is None:
value = tp()
if value.__traceback__ is not tb:
raise value.with_traceback(tb)
raise value
../../../.local/lib/python3.8/site-packages/six.py:703:
self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0e7705e0>
method = 'PUT', url = '/parquet', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
retries = Retry(total=False, connect=None, read=None, redirect=0, status=None)
redirect = True, assert_same_host = False
timeout = <object object at 0x7fbbe1452220>, pool_timeout = None
release_conn = False, chunked = False, body_pos = None
response_kw = {'decode_content': False, 'preload_content': False}, conn = None
release_this_conn = True, err = None, clean_exit = False
timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbb0e6d53d0>
is_new_proxy_conn = False
def urlopen(
self,
method,
url,
body=None,
headers=None,
retries=None,
redirect=True,
assert_same_host=True,
timeout=_Default,
pool_timeout=None,
release_conn=None,
chunked=False,
body_pos=None,
**response_kw
):
"""
Get a connection from the pool and perform an HTTP request. This is the
lowest level call for making a request, so you'll need to specify all
the raw details.
.. note::
More commonly, it's appropriate to use a convenience method provided
by :class:`.RequestMethods`, such as :meth:`request`.
.. note::
`release_conn` will only behave as expected if
`preload_content=False` because we want to make
`preload_content=False` the default behaviour someday soon without
breaking backwards compatibility.
:param method:
HTTP request method (such as GET, POST, PUT, etc.)
:param body:
Data to send in the request body (useful for creating
POST requests, see HTTPConnectionPool.post_url for
more convenience).
:param headers:
Dictionary of custom headers to send, such as User-Agent,
If-None-Match, etc. If None, pool headers are used. If provided,
these headers completely replace any pool-specific headers.
:param retries:
Configure the number of retries to allow before raising a
:class:`~urllib3.exceptions.MaxRetryError` exception.
Pass ``None`` to retry until you receive a response. Pass a
:class:`~urllib3.util.retry.Retry` object for fine-grained control
over different types of retries.
Pass an integer number to retry connection errors that many times,
but no other types of errors. Pass zero to never retry.
If ``False``, then retries are disabled and any exception is raised
immediately. Also, instead of raising a MaxRetryError on redirects,
the redirect response will be returned.
:type retries: :class:`~urllib3.util.retry.Retry`, False, or an int.
:param redirect:
If True, automatically handle redirects (status codes 301, 302,
303, 307, 308). Each redirect counts as a retry. Disabling retries
will disable redirect, too.
:param assert_same_host:
If ``True``, will make sure that the host of the pool requests is
consistent else will raise HostChangedError. When False, you can
use the pool on an HTTP proxy and request foreign hosts.
:param timeout:
If specified, overrides the default timeout for this one
request. It may be a float (in seconds) or an instance of
:class:`urllib3.util.Timeout`.
:param pool_timeout:
If set and the pool is set to block=True, then this method will
block for ``pool_timeout`` seconds and raise EmptyPoolError if no
connection is available within the time period.
:param release_conn:
If False, then the urlopen call will not release the connection
back into the pool once a response is received (but will release if
you read the entire contents of the response such as when
`preload_content=True`). This is useful if you're not preloading
the response's content immediately. You will need to call
``r.release_conn()`` on the response ``r`` to return the connection
back into the pool. If None, it takes the value of
``response_kw.get('preload_content', True)``.
:param chunked:
If True, urllib3 will send the body using chunked transfer
encoding. Otherwise, urllib3 will send the body using the standard
content-length form. Defaults to False.
:param int body_pos:
Position to seek to in file-like body in the event of a retry or
redirect. Typically this won't need to be set because urllib3 will
auto-populate the value when needed.
:param \\**response_kw:
Additional parameters are passed to
:meth:`urllib3.response.HTTPResponse.from_httplib`
"""
if headers is None:
headers = self.headers
if not isinstance(retries, Retry):
retries = Retry.from_int(retries, redirect=redirect, default=self.retries)
if release_conn is None:
release_conn = response_kw.get("preload_content", True)
# Check host
if assert_same_host and not self.is_same_host(url):
raise HostChangedError(self, url, retries)
# Ensure that the URL we're connecting to is properly encoded
if url.startswith("/"):
url = six.ensure_str(_encode_target(url))
else:
url = six.ensure_str(parse_url(url).url)
conn = None
# Track whether `conn` needs to be released before
# returning/raising/recursing. Update this variable if necessary, and
# leave `release_conn` constant throughout the function. That way, if
# the function recurses, the original value of `release_conn` will be
# passed down into the recursive call, and its value will be respected.
#
# See issue #651 [1] for details.
#
# [1] <https://github.com/urllib3/urllib3/issues/651>
release_this_conn = release_conn
# Merge the proxy headers. Only do this in HTTP. We have to copy the
# headers dict so we can safely change it without those changes being
# reflected in anyone else's copy.
if self.scheme == "http":
headers = headers.copy()
headers.update(self.proxy_headers)
# Must keep the exception bound to a separate variable or else Python 3
# complains about UnboundLocalError.
err = None
# Keep track of whether we cleanly exited the except block. This
# ensures we do proper cleanup in finally.
clean_exit = False
# Rewind body position, if needed. Record current position
# for future rewinds in the event of a redirect/retry.
body_pos = set_file_position(body, body_pos)
try:
# Request a connection from the queue.
timeout_obj = self._get_timeout(timeout)
conn = self._get_conn(timeout=pool_timeout)
conn.timeout = timeout_obj.connect_timeout
is_new_proxy_conn = self.proxy is not None and not getattr(
conn, "sock", None
)
if is_new_proxy_conn:
self._prepare_proxy(conn)
# Make the request on the httplib connection object.
httplib_response = self._make_request(
conn,
method,
url,
timeout=timeout_obj,
body=body,
headers=headers,
chunked=chunked,
)
/usr/lib/python3/dist-packages/urllib3/connectionpool.py:665:
self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0e7705e0>
conn = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
method = 'PUT', url = '/parquet'
timeout = <urllib3.util.timeout.Timeout object at 0x7fbb0e6d53d0>
chunked = False
httplib_request_kw = {'body': None, 'headers': {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-...nvocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}}
timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbb0e770040>
def _make_request(
self, conn, method, url, timeout=_Default, chunked=False, **httplib_request_kw
):
"""
Perform a request on a given urllib connection object taken from our
pool.
:param conn:
a connection from one of our connection pools
:param timeout:
Socket timeout in seconds for the request. This can be a
float or integer, which will set the same timeout value for
the socket connect and the socket read, or an instance of
:class:`urllib3.util.Timeout`, which gives you more fine-grained
control over your timeouts.
"""
self.num_requests += 1
timeout_obj = self._get_timeout(timeout)
timeout_obj.start_connect()
conn.timeout = timeout_obj.connect_timeout
# Trigger any extra validation we need to do.
try:
self._validate_conn(conn)
except (SocketTimeout, BaseSSLError) as e:
# Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
raise
# conn.request() calls httplib.*.request, not the method in
# urllib3.request. It also calls makefile (recv) on the socket.
if chunked:
conn.request_chunked(method, url, **httplib_request_kw)
else:
conn.request(method, url, **httplib_request_kw)
/usr/lib/python3/dist-packages/urllib3/connectionpool.py:387:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
method = 'PUT', url = '/parquet', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
def request(self, method, url, body=None, headers={}, *,
encode_chunked=False):
"""Send a complete request to the server."""
self._send_request(method, url, body, headers, encode_chunked)
/usr/lib/python3.8/http/client.py:1256:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
method = 'PUT', url = '/parquet', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
args = (False,), kwargs = {}
def _send_request(self, method, url, body, headers, *args, **kwargs):
self._response_received = False
if headers.get('Expect', b'') == b'100-continue':
self._expect_header_set = True
else:
self._expect_header_set = False
self.response_class = self._original_response_cls
rval = super()._send_request(
method, url, body, headers, *args, **kwargs
)
/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:94:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
method = 'PUT', url = '/parquet', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
encode_chunked = False
def _send_request(self, method, url, body, headers, encode_chunked):
# Honor explicitly requested Host: and Accept-Encoding: headers.
header_names = frozenset(k.lower() for k in headers)
skips = {}
if 'host' in header_names:
skips['skip_host'] = 1
if 'accept-encoding' in header_names:
skips['skip_accept_encoding'] = 1
self.putrequest(method, url, **skips)
# chunked encoding will happen if HTTP/1.1 is used and either
# the caller passes encode_chunked=True or the following
# conditions hold:
# 1. content-length has not been explicitly set
# 2. the body is a file or iterable, but not a str or bytes-like
# 3. Transfer-Encoding has NOT been explicitly set by the caller
if 'content-length' not in header_names:
# only chunk body if not explicitly set for backwards
# compatibility, assuming the client code is already handling the
# chunking
if 'transfer-encoding' not in header_names:
# if content-length cannot be automatically determined, fall
# back to chunked encoding
encode_chunked = False
content_length = self._get_content_length(body, method)
if content_length is None:
if body is not None:
if self.debuglevel > 0:
print('Unable to determine size of %r' % body)
encode_chunked = True
self.putheader('Transfer-Encoding', 'chunked')
else:
self.putheader('Content-Length', str(content_length))
else:
encode_chunked = False
for hdr, value in headers.items():
self.putheader(hdr, value)
if isinstance(body, str):
# RFC 2616 Section 3.7.1 says that text default has a
# default charset of iso-8859-1.
body = _encode(body, 'body')
self.endheaders(body, encode_chunked=encode_chunked)
/usr/lib/python3.8/http/client.py:1302:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
message_body = None
def endheaders(self, message_body=None, *, encode_chunked=False):
"""Indicate that the last header line has been sent to the server.
This method sends the request to the server. The optional message_body
argument can be used to pass a message body associated with the
request.
"""
if self.__state == _CS_REQ_STARTED:
self.__state = _CS_REQ_SENT
else:
raise CannotSendHeader()
self._send_output(message_body, encode_chunked=encode_chunked)
/usr/lib/python3.8/http/client.py:1251:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
message_body = None, args = (), kwargs = {'encode_chunked': False}
msg = b'PUT /parquet HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-A...-invocation-id: 5b41a982-e65e-407f-93da-29b3a02c5d15\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'
def _send_output(self, message_body=None, *args, **kwargs):
self._buffer.extend((b"", b""))
msg = self._convert_to_bytes(self._buffer)
del self._buffer[:]
# If msg and message_body are sent in a single send() call,
# it will avoid performance problems caused by the interaction
# between delayed ack and the Nagle algorithm.
if isinstance(message_body, bytes):
msg += message_body
message_body = None
self.send(msg)
/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:123:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
str = b'PUT /parquet HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-A...-invocation-id: 5b41a982-e65e-407f-93da-29b3a02c5d15\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'
def send(self, str):
if self._response_received:
logger.debug(
"send() called, but reseponse already received. "
"Not sending data."
)
return
return super().send(str)
/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:218:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
data = b'PUT /parquet HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-A...-invocation-id: 5b41a982-e65e-407f-93da-29b3a02c5d15\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'
def send(self, data):
"""Send `data' to the server.
``data`` can be a string object, a bytes object, an array object, a
file-like object that supports a .read() method, or an iterable object.
"""
if self.sock is None:
if self.auto_open:
self.connect()
/usr/lib/python3.8/http/client.py:951:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
def connect(self):
conn = self._new_conn()
/usr/lib/python3/dist-packages/urllib3/connection.py:187:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>
def _new_conn(self):
""" Establish a socket connection and set nodelay settings on it.
:return: New socket connection.
"""
extra_kw = {}
if self.source_address:
extra_kw["source_address"] = self.source_address
if self.socket_options:
extra_kw["socket_options"] = self.socket_options
try:
conn = connection.create_connection(
(self._dns_host, self.port), self.timeout, **extra_kw
)
except SocketTimeout:
raise ConnectTimeoutError(
self,
"Connection to %s timed out. (connect timeout=%s)"
% (self.host, self.timeout),
)
except SocketError as e:
raise NewConnectionError(
self, "Failed to establish a new connection: %s" % e
)
E urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7fbb0e7703d0>: Failed to establish a new connection: [Errno 111] Connection refused
/usr/lib/python3/dist-packages/urllib3/connection.py:171: NewConnectionError
During handling of the above exception, another exception occurred:
s3_base = 'http://127.0.0.1:5000/'
s3so = {'client_kwargs': {'endpoint_url': 'http://127.0.0.1:5000/'}}
paths = ['/tmp/pytest-of-jenkins/pytest-14/parquet0/dataset-0.parquet', '/tmp/pytest-of-jenkins/pytest-14/parquet0/dataset-1.parquet']
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
engine = 'parquet'
df = name-cat name-string id label x y
0 Edith Laura 1054 964 -0.792165 0.069362
...da 976 964 -0.270133 0.839677
4320 Alice Ray 967 977 0.033737 -0.727091
[4321 rows x 6 columns]
patch_aiobotocore = None
@pytest.mark.parametrize("engine", ["parquet", "csv"])
def test_s3_dataset(s3_base, s3so, paths, datasets, engine, df, patch_aiobotocore):
# Copy files to mock s3 bucket
files = {}
for i, path in enumerate(paths):
with open(path, "rb") as f:
fbytes = f.read()
fn = path.split(os.path.sep)[-1]
files[fn] = BytesIO()
files[fn].write(fbytes)
files[fn].seek(0)
if engine == "parquet":
# Workaround for nvt#539. In order to avoid the
# bug in Dask's `create_metadata_file`, we need
# to manually generate a "_metadata" file here.
# This can be removed after dask#7295 is merged
# (see https://github.com/dask/dask/pull/7295)
fn = "_metadata"
files[fn] = BytesIO()
meta = create_metadata_file(
paths,
engine="pyarrow",
out_dir=False,
)
meta.write_metadata_file(files[fn])
files[fn].seek(0)
with s3_context(s3_base=s3_base, bucket=engine, files=files) as s3fs:
tests/unit/test_s3.py:97:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:96: in s3_context
client.create_bucket(Bucket=bucket, ACL="public-read-write")
/usr/local/lib/python3.8/dist-packages/botocore/client.py:508: in _api_call
return self._make_api_call(operation_name, kwargs)
/usr/local/lib/python3.8/dist-packages/botocore/client.py:898: in _make_api_call
http, parsed_response = self._make_request(
/usr/local/lib/python3.8/dist-packages/botocore/client.py:921: in _make_request
return self._endpoint.make_request(operation_model, request_dict)
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:119: in make_request
return self._send_request(request_dict, operation_model)
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:202: in _send_request
while self._needs_retry(
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:354: in _needs_retry
responses = self._event_emitter.emit(
/usr/local/lib/python3.8/dist-packages/botocore/hooks.py:412: in emit
return self._emitter.emit(aliased_event_name, **kwargs)
/usr/local/lib/python3.8/dist-packages/botocore/hooks.py:256: in emit
return self._emit(event_name, kwargs)
/usr/local/lib/python3.8/dist-packages/botocore/hooks.py:239: in _emit
response = handler(**kwargs)
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:207: in call
if self._checker(**checker_kwargs):
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:284: in call
should_retry = self._should_retry(
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:320: in _should_retry
return self._checker(attempt_number, response, caught_exception)
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:363: in call
checker_response = checker(
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:247: in call
return self._check_caught_exception(
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:416: in _check_caught_exception
raise caught_exception
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:281: in _do_get_response
http_response = self._send(request)
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:377: in _send
return self.http_session.send(request)
self = <botocore.httpsession.URLLib3Session object at 0x7fbad6651cd0>
request = <AWSPreparedRequest stream_output=False, method=PUT, url=http://127.0.0.1:5000/parquet, headers={'x-amz-acl': b'public...nvocation-id': b'5b41a982-e65e-407f-93da-29b3a02c5d15', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}>
def send(self, request):
try:
proxy_url = self._proxy_config.proxy_url_for(request.url)
manager = self._get_connection_manager(request.url, proxy_url)
conn = manager.connection_from_url(request.url)
self._setup_ssl_cert(conn, request.url, self._verify)
if ensure_boolean(
os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', '')
):
# This is currently an "experimental" feature which provides
# no guarantees of backwards compatibility. It may be subject
# to change or removal in any patch version. Anyone opting in
# to this feature should strictly pin botocore.
host = urlparse(request.url).hostname
conn.proxy_headers['host'] = host
request_target = self._get_request_target(request.url, proxy_url)
urllib_response = conn.urlopen(
method=request.method,
url=request_target,
body=request.body,
headers=request.headers,
retries=Retry(False),
assert_same_host=False,
preload_content=False,
decode_content=False,
chunked=self._chunked(request.headers),
)
http_response = botocore.awsrequest.AWSResponse(
request.url,
urllib_response.status,
urllib_response.headers,
urllib_response,
)
if not request.stream_output:
# Cause the raw stream to be exhausted immediately. We do it
# this way instead of using preload_content because
# preload_content will never buffer chunked responses
http_response.content
return http_response
except URLLib3SSLError as e:
raise SSLError(endpoint_url=request.url, error=e)
except (NewConnectionError, socket.gaierror) as e:
raise EndpointConnectionError(endpoint_url=request.url, error=e)
E botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5000/parquet"
/usr/local/lib/python3.8/dist-packages/botocore/httpsession.py:477: EndpointConnectionError
---------------------------- Captured stderr setup -----------------------------
Traceback (most recent call last):
File "/usr/local/bin/moto_server", line 5, in
from moto.server import main
File "/usr/local/lib/python3.8/dist-packages/moto/server.py", line 7, in
from moto.moto_server.werkzeug_app import (
File "/usr/local/lib/python3.8/dist-packages/moto/moto_server/werkzeug_app.py", line 6, in
from flask import Flask
File "/usr/local/lib/python3.8/dist-packages/flask/init.py", line 4, in
from . import json as json
File "/usr/local/lib/python3.8/dist-packages/flask/json/init.py", line 8, in
from ..globals import current_app
File "/usr/local/lib/python3.8/dist-packages/flask/globals.py", line 56, in
app_ctx: "AppContext" = LocalProxy( # type: ignore[assignment]
TypeError: init() got an unexpected keyword argument 'unbound_message'
_____________________________ test_s3_dataset[csv] _____________________________
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
def _new_conn(self):
""" Establish a socket connection and set nodelay settings on it.
:return: New socket connection.
"""
extra_kw = {}
if self.source_address:
extra_kw["source_address"] = self.source_address
if self.socket_options:
extra_kw["socket_options"] = self.socket_options
try:
conn = connection.create_connection(
(self._dns_host, self.port), self.timeout, **extra_kw
)
/usr/lib/python3/dist-packages/urllib3/connection.py:159:
address = ('127.0.0.1', 5000), timeout = 60, source_address = None
socket_options = [(6, 1, 1)]
def create_connection(
address,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None,
socket_options=None,
):
"""Connect to *address* and return the socket object.
Convenience function. Connect to *address* (a 2-tuple ``(host,
port)``) and return the socket object. Passing the optional
*timeout* parameter will set the timeout on the socket instance
before attempting to connect. If no *timeout* is supplied, the
global default timeout setting returned by :func:`getdefaulttimeout`
is used. If *source_address* is set it must be a tuple of (host, port)
for the socket to bind as a source address before making the connection.
An host of '' or port 0 tells the OS to use the default.
"""
host, port = address
if host.startswith("["):
host = host.strip("[]")
err = None
# Using the value from allowed_gai_family() in the context of getaddrinfo lets
# us select whether to work with IPv4 DNS records, IPv6 records, or both.
# The original create_connection function always returns all records.
family = allowed_gai_family()
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
sock = socket.socket(af, socktype, proto)
# If provided, set socket level options before connecting.
_set_socket_options(sock, socket_options)
if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(sa)
return sock
except socket.error as e:
err = e
if sock is not None:
sock.close()
sock = None
if err is not None:
raise err
/usr/lib/python3/dist-packages/urllib3/util/connection.py:84:
address = ('127.0.0.1', 5000), timeout = 60, source_address = None
socket_options = [(6, 1, 1)]
def create_connection(
address,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT,
source_address=None,
socket_options=None,
):
"""Connect to *address* and return the socket object.
Convenience function. Connect to *address* (a 2-tuple ``(host,
port)``) and return the socket object. Passing the optional
*timeout* parameter will set the timeout on the socket instance
before attempting to connect. If no *timeout* is supplied, the
global default timeout setting returned by :func:`getdefaulttimeout`
is used. If *source_address* is set it must be a tuple of (host, port)
for the socket to bind as a source address before making the connection.
An host of '' or port 0 tells the OS to use the default.
"""
host, port = address
if host.startswith("["):
host = host.strip("[]")
err = None
# Using the value from allowed_gai_family() in the context of getaddrinfo lets
# us select whether to work with IPv4 DNS records, IPv6 records, or both.
# The original create_connection function always returns all records.
family = allowed_gai_family()
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
sock = socket.socket(af, socktype, proto)
# If provided, set socket level options before connecting.
_set_socket_options(sock, socket_options)
if timeout is not socket._GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(sa)
E ConnectionRefusedError: [Errno 111] Connection refused
/usr/lib/python3/dist-packages/urllib3/util/connection.py:74: ConnectionRefusedError
During handling of the above exception, another exception occurred:
self = <botocore.httpsession.URLLib3Session object at 0x7fbad458f070>
request = <AWSPreparedRequest stream_output=False, method=PUT, url=http://127.0.0.1:5000/csv, headers={'x-amz-acl': b'public-rea...nvocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}>
def send(self, request):
try:
proxy_url = self._proxy_config.proxy_url_for(request.url)
manager = self._get_connection_manager(request.url, proxy_url)
conn = manager.connection_from_url(request.url)
self._setup_ssl_cert(conn, request.url, self._verify)
if ensure_boolean(
os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', '')
):
# This is currently an "experimental" feature which provides
# no guarantees of backwards compatibility. It may be subject
# to change or removal in any patch version. Anyone opting in
# to this feature should strictly pin botocore.
host = urlparse(request.url).hostname
conn.proxy_headers['host'] = host
request_target = self._get_request_target(request.url, proxy_url)
urllib_response = conn.urlopen(
method=request.method,
url=request_target,
body=request.body,
headers=request.headers,
retries=Retry(False),
assert_same_host=False,
preload_content=False,
decode_content=False,
chunked=self._chunked(request.headers),
)
/usr/local/lib/python3.8/dist-packages/botocore/httpsession.py:448:
self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0f6f3a60>
method = 'PUT', url = '/csv', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
retries = Retry(total=False, connect=None, read=None, redirect=0, status=None)
redirect = True, assert_same_host = False
timeout = <object object at 0x7fbbe1452220>, pool_timeout = None
release_conn = False, chunked = False, body_pos = None
response_kw = {'decode_content': False, 'preload_content': False}, conn = None
release_this_conn = True, err = None, clean_exit = False
timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbadb6f28e0>
is_new_proxy_conn = False
def urlopen(
self,
method,
url,
body=None,
headers=None,
retries=None,
redirect=True,
assert_same_host=True,
timeout=_Default,
pool_timeout=None,
release_conn=None,
chunked=False,
body_pos=None,
**response_kw
):
"""
Get a connection from the pool and perform an HTTP request. This is the
lowest level call for making a request, so you'll need to specify all
the raw details.
.. note::
More commonly, it's appropriate to use a convenience method provided
by :class:`.RequestMethods`, such as :meth:`request`.
.. note::
`release_conn` will only behave as expected if
`preload_content=False` because we want to make
`preload_content=False` the default behaviour someday soon without
breaking backwards compatibility.
:param method:
HTTP request method (such as GET, POST, PUT, etc.)
:param body:
Data to send in the request body (useful for creating
POST requests, see HTTPConnectionPool.post_url for
more convenience).
:param headers:
Dictionary of custom headers to send, such as User-Agent,
If-None-Match, etc. If None, pool headers are used. If provided,
these headers completely replace any pool-specific headers.
:param retries:
Configure the number of retries to allow before raising a
:class:`~urllib3.exceptions.MaxRetryError` exception.
Pass ``None`` to retry until you receive a response. Pass a
:class:`~urllib3.util.retry.Retry` object for fine-grained control
over different types of retries.
Pass an integer number to retry connection errors that many times,
but no other types of errors. Pass zero to never retry.
If ``False``, then retries are disabled and any exception is raised
immediately. Also, instead of raising a MaxRetryError on redirects,
the redirect response will be returned.
:type retries: :class:`~urllib3.util.retry.Retry`, False, or an int.
:param redirect:
If True, automatically handle redirects (status codes 301, 302,
303, 307, 308). Each redirect counts as a retry. Disabling retries
will disable redirect, too.
:param assert_same_host:
If ``True``, will make sure that the host of the pool requests is
consistent else will raise HostChangedError. When False, you can
use the pool on an HTTP proxy and request foreign hosts.
:param timeout:
If specified, overrides the default timeout for this one
request. It may be a float (in seconds) or an instance of
:class:`urllib3.util.Timeout`.
:param pool_timeout:
If set and the pool is set to block=True, then this method will
block for ``pool_timeout`` seconds and raise EmptyPoolError if no
connection is available within the time period.
:param release_conn:
If False, then the urlopen call will not release the connection
back into the pool once a response is received (but will release if
you read the entire contents of the response such as when
`preload_content=True`). This is useful if you're not preloading
the response's content immediately. You will need to call
``r.release_conn()`` on the response ``r`` to return the connection
back into the pool. If None, it takes the value of
``response_kw.get('preload_content', True)``.
:param chunked:
If True, urllib3 will send the body using chunked transfer
encoding. Otherwise, urllib3 will send the body using the standard
content-length form. Defaults to False.
:param int body_pos:
Position to seek to in file-like body in the event of a retry or
redirect. Typically this won't need to be set because urllib3 will
auto-populate the value when needed.
:param \\**response_kw:
Additional parameters are passed to
:meth:`urllib3.response.HTTPResponse.from_httplib`
"""
if headers is None:
headers = self.headers
if not isinstance(retries, Retry):
retries = Retry.from_int(retries, redirect=redirect, default=self.retries)
if release_conn is None:
release_conn = response_kw.get("preload_content", True)
# Check host
if assert_same_host and not self.is_same_host(url):
raise HostChangedError(self, url, retries)
# Ensure that the URL we're connecting to is properly encoded
if url.startswith("/"):
url = six.ensure_str(_encode_target(url))
else:
url = six.ensure_str(parse_url(url).url)
conn = None
# Track whether `conn` needs to be released before
# returning/raising/recursing. Update this variable if necessary, and
# leave `release_conn` constant throughout the function. That way, if
# the function recurses, the original value of `release_conn` will be
# passed down into the recursive call, and its value will be respected.
#
# See issue #651 [1] for details.
#
# [1] <https://github.com/urllib3/urllib3/issues/651>
release_this_conn = release_conn
# Merge the proxy headers. Only do this in HTTP. We have to copy the
# headers dict so we can safely change it without those changes being
# reflected in anyone else's copy.
if self.scheme == "http":
headers = headers.copy()
headers.update(self.proxy_headers)
# Must keep the exception bound to a separate variable or else Python 3
# complains about UnboundLocalError.
err = None
# Keep track of whether we cleanly exited the except block. This
# ensures we do proper cleanup in finally.
clean_exit = False
# Rewind body position, if needed. Record current position
# for future rewinds in the event of a redirect/retry.
body_pos = set_file_position(body, body_pos)
try:
# Request a connection from the queue.
timeout_obj = self._get_timeout(timeout)
conn = self._get_conn(timeout=pool_timeout)
conn.timeout = timeout_obj.connect_timeout
is_new_proxy_conn = self.proxy is not None and not getattr(
conn, "sock", None
)
if is_new_proxy_conn:
self._prepare_proxy(conn)
# Make the request on the httplib connection object.
httplib_response = self._make_request(
conn,
method,
url,
timeout=timeout_obj,
body=body,
headers=headers,
chunked=chunked,
)
# If we're going to release the connection in ``finally:``, then
# the response doesn't need to know about the connection. Otherwise
# it will also try to release it and we'll have a double-release
# mess.
response_conn = conn if not release_conn else None
# Pass method to Response for length checking
response_kw["request_method"] = method
# Import httplib's response into our own wrapper object
response = self.ResponseCls.from_httplib(
httplib_response,
pool=self,
connection=response_conn,
retries=retries,
**response_kw
)
# Everything went great!
clean_exit = True
except queue.Empty:
# Timed out by queue.
raise EmptyPoolError(self, "No pool connections are available.")
except (
TimeoutError,
HTTPException,
SocketError,
ProtocolError,
BaseSSLError,
SSLError,
CertificateError,
) as e:
# Discard the connection for these exceptions. It will be
# replaced during the next _get_conn() call.
clean_exit = False
if isinstance(e, (BaseSSLError, CertificateError)):
e = SSLError(e)
elif isinstance(e, (SocketError, NewConnectionError)) and self.proxy:
e = ProxyError("Cannot connect to proxy.", e)
elif isinstance(e, (SocketError, HTTPException)):
e = ProtocolError("Connection aborted.", e)
retries = retries.increment(
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
)
/usr/lib/python3/dist-packages/urllib3/connectionpool.py:719:
self = Retry(total=False, connect=None, read=None, redirect=0, status=None)
method = 'PUT', url = '/csv', response = None
error = NewConnectionError('<botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>: Failed to establish a new connection: [Errno 111] Connection refused')
_pool = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0f6f3a60>
_stacktrace = <traceback object at 0x7fbad70ed800>
def increment(
self,
method=None,
url=None,
response=None,
error=None,
_pool=None,
_stacktrace=None,
):
""" Return a new Retry object with incremented retry counters.
:param response: A response object, or None, if the server did not
return a response.
:type response: :class:`~urllib3.response.HTTPResponse`
:param Exception error: An error encountered during the request, or
None if the response was received successfully.
:return: A new ``Retry`` object.
"""
if self.total is False and error:
# Disabled, indicate to re-raise the error.
raise six.reraise(type(error), error, _stacktrace)
/usr/lib/python3/dist-packages/urllib3/util/retry.py:376:
tp = <class 'urllib3.exceptions.NewConnectionError'>, value = None, tb = None
def reraise(tp, value, tb=None):
try:
if value is None:
value = tp()
if value.__traceback__ is not tb:
raise value.with_traceback(tb)
raise value
../../../.local/lib/python3.8/site-packages/six.py:703:
self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0f6f3a60>
method = 'PUT', url = '/csv', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
retries = Retry(total=False, connect=None, read=None, redirect=0, status=None)
redirect = True, assert_same_host = False
timeout = <object object at 0x7fbbe1452220>, pool_timeout = None
release_conn = False, chunked = False, body_pos = None
response_kw = {'decode_content': False, 'preload_content': False}, conn = None
release_this_conn = True, err = None, clean_exit = False
timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbadb6f28e0>
is_new_proxy_conn = False
def urlopen(
self,
method,
url,
body=None,
headers=None,
retries=None,
redirect=True,
assert_same_host=True,
timeout=_Default,
pool_timeout=None,
release_conn=None,
chunked=False,
body_pos=None,
**response_kw
):
"""
Get a connection from the pool and perform an HTTP request. This is the
lowest level call for making a request, so you'll need to specify all
the raw details.
.. note::
More commonly, it's appropriate to use a convenience method provided
by :class:`.RequestMethods`, such as :meth:`request`.
.. note::
`release_conn` will only behave as expected if
`preload_content=False` because we want to make
`preload_content=False` the default behaviour someday soon without
breaking backwards compatibility.
:param method:
HTTP request method (such as GET, POST, PUT, etc.)
:param body:
Data to send in the request body (useful for creating
POST requests, see HTTPConnectionPool.post_url for
more convenience).
:param headers:
Dictionary of custom headers to send, such as User-Agent,
If-None-Match, etc. If None, pool headers are used. If provided,
these headers completely replace any pool-specific headers.
:param retries:
Configure the number of retries to allow before raising a
:class:`~urllib3.exceptions.MaxRetryError` exception.
Pass ``None`` to retry until you receive a response. Pass a
:class:`~urllib3.util.retry.Retry` object for fine-grained control
over different types of retries.
Pass an integer number to retry connection errors that many times,
but no other types of errors. Pass zero to never retry.
If ``False``, then retries are disabled and any exception is raised
immediately. Also, instead of raising a MaxRetryError on redirects,
the redirect response will be returned.
:type retries: :class:`~urllib3.util.retry.Retry`, False, or an int.
:param redirect:
If True, automatically handle redirects (status codes 301, 302,
303, 307, 308). Each redirect counts as a retry. Disabling retries
will disable redirect, too.
:param assert_same_host:
If ``True``, will make sure that the host of the pool requests is
consistent else will raise HostChangedError. When False, you can
use the pool on an HTTP proxy and request foreign hosts.
:param timeout:
If specified, overrides the default timeout for this one
request. It may be a float (in seconds) or an instance of
:class:`urllib3.util.Timeout`.
:param pool_timeout:
If set and the pool is set to block=True, then this method will
block for ``pool_timeout`` seconds and raise EmptyPoolError if no
connection is available within the time period.
:param release_conn:
If False, then the urlopen call will not release the connection
back into the pool once a response is received (but will release if
you read the entire contents of the response such as when
`preload_content=True`). This is useful if you're not preloading
the response's content immediately. You will need to call
``r.release_conn()`` on the response ``r`` to return the connection
back into the pool. If None, it takes the value of
``response_kw.get('preload_content', True)``.
:param chunked:
If True, urllib3 will send the body using chunked transfer
encoding. Otherwise, urllib3 will send the body using the standard
content-length form. Defaults to False.
:param int body_pos:
Position to seek to in file-like body in the event of a retry or
redirect. Typically this won't need to be set because urllib3 will
auto-populate the value when needed.
:param \\**response_kw:
Additional parameters are passed to
:meth:`urllib3.response.HTTPResponse.from_httplib`
"""
if headers is None:
headers = self.headers
if not isinstance(retries, Retry):
retries = Retry.from_int(retries, redirect=redirect, default=self.retries)
if release_conn is None:
release_conn = response_kw.get("preload_content", True)
# Check host
if assert_same_host and not self.is_same_host(url):
raise HostChangedError(self, url, retries)
# Ensure that the URL we're connecting to is properly encoded
if url.startswith("/"):
url = six.ensure_str(_encode_target(url))
else:
url = six.ensure_str(parse_url(url).url)
conn = None
# Track whether `conn` needs to be released before
# returning/raising/recursing. Update this variable if necessary, and
# leave `release_conn` constant throughout the function. That way, if
# the function recurses, the original value of `release_conn` will be
# passed down into the recursive call, and its value will be respected.
#
# See issue #651 [1] for details.
#
# [1] <https://github.com/urllib3/urllib3/issues/651>
release_this_conn = release_conn
# Merge the proxy headers. Only do this in HTTP. We have to copy the
# headers dict so we can safely change it without those changes being
# reflected in anyone else's copy.
if self.scheme == "http":
headers = headers.copy()
headers.update(self.proxy_headers)
# Must keep the exception bound to a separate variable or else Python 3
# complains about UnboundLocalError.
err = None
# Keep track of whether we cleanly exited the except block. This
# ensures we do proper cleanup in finally.
clean_exit = False
# Rewind body position, if needed. Record current position
# for future rewinds in the event of a redirect/retry.
body_pos = set_file_position(body, body_pos)
try:
# Request a connection from the queue.
timeout_obj = self._get_timeout(timeout)
conn = self._get_conn(timeout=pool_timeout)
conn.timeout = timeout_obj.connect_timeout
is_new_proxy_conn = self.proxy is not None and not getattr(
conn, "sock", None
)
if is_new_proxy_conn:
self._prepare_proxy(conn)
# Make the request on the httplib connection object.
httplib_response = self._make_request(
conn,
method,
url,
timeout=timeout_obj,
body=body,
headers=headers,
chunked=chunked,
)
/usr/lib/python3/dist-packages/urllib3/connectionpool.py:665:
self = <botocore.awsrequest.AWSHTTPConnectionPool object at 0x7fbb0f6f3a60>
conn = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
method = 'PUT', url = '/csv'
timeout = <urllib3.util.timeout.Timeout object at 0x7fbadb6f28e0>
chunked = False
httplib_request_kw = {'body': None, 'headers': {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-...nvocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}}
timeout_obj = <urllib3.util.timeout.Timeout object at 0x7fbad68bdcd0>
def _make_request(
self, conn, method, url, timeout=_Default, chunked=False, **httplib_request_kw
):
"""
Perform a request on a given urllib connection object taken from our
pool.
:param conn:
a connection from one of our connection pools
:param timeout:
Socket timeout in seconds for the request. This can be a
float or integer, which will set the same timeout value for
the socket connect and the socket read, or an instance of
:class:`urllib3.util.Timeout`, which gives you more fine-grained
control over your timeouts.
"""
self.num_requests += 1
timeout_obj = self._get_timeout(timeout)
timeout_obj.start_connect()
conn.timeout = timeout_obj.connect_timeout
# Trigger any extra validation we need to do.
try:
self._validate_conn(conn)
except (SocketTimeout, BaseSSLError) as e:
# Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
raise
# conn.request() calls httplib.*.request, not the method in
# urllib3.request. It also calls makefile (recv) on the socket.
if chunked:
conn.request_chunked(method, url, **httplib_request_kw)
else:
conn.request(method, url, **httplib_request_kw)
/usr/lib/python3/dist-packages/urllib3/connectionpool.py:387:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
method = 'PUT', url = '/csv', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
def request(self, method, url, body=None, headers={}, *,
encode_chunked=False):
"""Send a complete request to the server."""
self._send_request(method, url, body, headers, encode_chunked)
/usr/lib/python3.8/http/client.py:1256:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
method = 'PUT', url = '/csv', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
args = (False,), kwargs = {}
def _send_request(self, method, url, body, headers, *args, **kwargs):
self._response_received = False
if headers.get('Expect', b'') == b'100-continue':
self._expect_header_set = True
else:
self._expect_header_set = False
self.response_class = self._original_response_cls
rval = super()._send_request(
method, url, body, headers, *args, **kwargs
)
/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:94:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
method = 'PUT', url = '/csv', body = None
headers = {'x-amz-acl': b'public-read-write', 'User-Agent': b'Boto3/1.17.0 Python/3.8.10 Linux/4.15.0-108-generic Botocore/1.27....invocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}
encode_chunked = False
def _send_request(self, method, url, body, headers, encode_chunked):
# Honor explicitly requested Host: and Accept-Encoding: headers.
header_names = frozenset(k.lower() for k in headers)
skips = {}
if 'host' in header_names:
skips['skip_host'] = 1
if 'accept-encoding' in header_names:
skips['skip_accept_encoding'] = 1
self.putrequest(method, url, **skips)
# chunked encoding will happen if HTTP/1.1 is used and either
# the caller passes encode_chunked=True or the following
# conditions hold:
# 1. content-length has not been explicitly set
# 2. the body is a file or iterable, but not a str or bytes-like
# 3. Transfer-Encoding has NOT been explicitly set by the caller
if 'content-length' not in header_names:
# only chunk body if not explicitly set for backwards
# compatibility, assuming the client code is already handling the
# chunking
if 'transfer-encoding' not in header_names:
# if content-length cannot be automatically determined, fall
# back to chunked encoding
encode_chunked = False
content_length = self._get_content_length(body, method)
if content_length is None:
if body is not None:
if self.debuglevel > 0:
print('Unable to determine size of %r' % body)
encode_chunked = True
self.putheader('Transfer-Encoding', 'chunked')
else:
self.putheader('Content-Length', str(content_length))
else:
encode_chunked = False
for hdr, value in headers.items():
self.putheader(hdr, value)
if isinstance(body, str):
# RFC 2616 Section 3.7.1 says that text default has a
# default charset of iso-8859-1.
body = _encode(body, 'body')
self.endheaders(body, encode_chunked=encode_chunked)
/usr/lib/python3.8/http/client.py:1302:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
message_body = None
def endheaders(self, message_body=None, *, encode_chunked=False):
"""Indicate that the last header line has been sent to the server.
This method sends the request to the server. The optional message_body
argument can be used to pass a message body associated with the
request.
"""
if self.__state == _CS_REQ_STARTED:
self.__state = _CS_REQ_SENT
else:
raise CannotSendHeader()
self._send_output(message_body, encode_chunked=encode_chunked)
/usr/lib/python3.8/http/client.py:1251:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
message_body = None, args = (), kwargs = {'encode_chunked': False}
msg = b'PUT /csv HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-Agent...-invocation-id: 14612633-35f0-4489-9ff4-5f34e64a6dcb\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'
def _send_output(self, message_body=None, *args, **kwargs):
self._buffer.extend((b"", b""))
msg = self._convert_to_bytes(self._buffer)
del self._buffer[:]
# If msg and message_body are sent in a single send() call,
# it will avoid performance problems caused by the interaction
# between delayed ack and the Nagle algorithm.
if isinstance(message_body, bytes):
msg += message_body
message_body = None
self.send(msg)
/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:123:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
str = b'PUT /csv HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-Agent...-invocation-id: 14612633-35f0-4489-9ff4-5f34e64a6dcb\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'
def send(self, str):
if self._response_received:
logger.debug(
"send() called, but reseponse already received. "
"Not sending data."
)
return
return super().send(str)
/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py:218:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
data = b'PUT /csv HTTP/1.1\r\nHost: 127.0.0.1:5000\r\nAccept-Encoding: identity\r\nx-amz-acl: public-read-write\r\nUser-Agent...-invocation-id: 14612633-35f0-4489-9ff4-5f34e64a6dcb\r\namz-sdk-request: attempt=5; max=5\r\nContent-Length: 0\r\n\r\n'
def send(self, data):
"""Send `data' to the server.
``data`` can be a string object, a bytes object, an array object, a
file-like object that supports a .read() method, or an iterable object.
"""
if self.sock is None:
if self.auto_open:
self.connect()
/usr/lib/python3.8/http/client.py:951:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
def connect(self):
conn = self._new_conn()
/usr/lib/python3/dist-packages/urllib3/connection.py:187:
self = <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>
def _new_conn(self):
""" Establish a socket connection and set nodelay settings on it.
:return: New socket connection.
"""
extra_kw = {}
if self.source_address:
extra_kw["source_address"] = self.source_address
if self.socket_options:
extra_kw["socket_options"] = self.socket_options
try:
conn = connection.create_connection(
(self._dns_host, self.port), self.timeout, **extra_kw
)
except SocketTimeout:
raise ConnectTimeoutError(
self,
"Connection to %s timed out. (connect timeout=%s)"
% (self.host, self.timeout),
)
except SocketError as e:
raise NewConnectionError(
self, "Failed to establish a new connection: %s" % e
)
E urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPConnection object at 0x7fbad68bd580>: Failed to establish a new connection: [Errno 111] Connection refused
/usr/lib/python3/dist-packages/urllib3/connection.py:171: NewConnectionError
During handling of the above exception, another exception occurred:
s3_base = 'http://127.0.0.1:5000/'
s3so = {'client_kwargs': {'endpoint_url': 'http://127.0.0.1:5000/'}}
paths = ['/tmp/pytest-of-jenkins/pytest-14/csv0/dataset-0.csv', '/tmp/pytest-of-jenkins/pytest-14/csv0/dataset-1.csv']
datasets = {'cats': local('/tmp/pytest-of-jenkins/pytest-14/cats0'), 'csv': local('/tmp/pytest-of-jenkins/pytest-14/csv0'), 'csv-...ocal('/tmp/pytest-of-jenkins/pytest-14/csv-no-header0'), 'parquet': local('/tmp/pytest-of-jenkins/pytest-14/parquet0')}
engine = 'csv'
df = name-string id label x y
0 Laura 1054 964 -0.792165 0.069362
1 Laura ... Zelda 976 964 -0.270133 0.839677
2160 Ray 967 977 0.033737 -0.727091
[4321 rows x 5 columns]
patch_aiobotocore = None
@pytest.mark.parametrize("engine", ["parquet", "csv"])
def test_s3_dataset(s3_base, s3so, paths, datasets, engine, df, patch_aiobotocore):
# Copy files to mock s3 bucket
files = {}
for i, path in enumerate(paths):
with open(path, "rb") as f:
fbytes = f.read()
fn = path.split(os.path.sep)[-1]
files[fn] = BytesIO()
files[fn].write(fbytes)
files[fn].seek(0)
if engine == "parquet":
# Workaround for nvt#539. In order to avoid the
# bug in Dask's `create_metadata_file`, we need
# to manually generate a "_metadata" file here.
# This can be removed after dask#7295 is merged
# (see https://github.com/dask/dask/pull/7295)
fn = "_metadata"
files[fn] = BytesIO()
meta = create_metadata_file(
paths,
engine="pyarrow",
out_dir=False,
)
meta.write_metadata_file(files[fn])
files[fn].seek(0)
with s3_context(s3_base=s3_base, bucket=engine, files=files) as s3fs:
tests/unit/test_s3.py:97:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:96: in s3_context
client.create_bucket(Bucket=bucket, ACL="public-read-write")
/usr/local/lib/python3.8/dist-packages/botocore/client.py:508: in _api_call
return self._make_api_call(operation_name, kwargs)
/usr/local/lib/python3.8/dist-packages/botocore/client.py:898: in _make_api_call
http, parsed_response = self._make_request(
/usr/local/lib/python3.8/dist-packages/botocore/client.py:921: in _make_request
return self._endpoint.make_request(operation_model, request_dict)
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:119: in make_request
return self._send_request(request_dict, operation_model)
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:202: in _send_request
while self._needs_retry(
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:354: in _needs_retry
responses = self._event_emitter.emit(
/usr/local/lib/python3.8/dist-packages/botocore/hooks.py:412: in emit
return self._emitter.emit(aliased_event_name, **kwargs)
/usr/local/lib/python3.8/dist-packages/botocore/hooks.py:256: in emit
return self._emit(event_name, kwargs)
/usr/local/lib/python3.8/dist-packages/botocore/hooks.py:239: in _emit
response = handler(**kwargs)
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:207: in call
if self._checker(**checker_kwargs):
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:284: in call
should_retry = self._should_retry(
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:320: in _should_retry
return self._checker(attempt_number, response, caught_exception)
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:363: in call
checker_response = checker(
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:247: in call
return self._check_caught_exception(
/usr/local/lib/python3.8/dist-packages/botocore/retryhandler.py:416: in _check_caught_exception
raise caught_exception
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:281: in _do_get_response
http_response = self._send(request)
/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py:377: in _send
return self.http_session.send(request)
self = <botocore.httpsession.URLLib3Session object at 0x7fbad458f070>
request = <AWSPreparedRequest stream_output=False, method=PUT, url=http://127.0.0.1:5000/csv, headers={'x-amz-acl': b'public-rea...nvocation-id': b'14612633-35f0-4489-9ff4-5f34e64a6dcb', 'amz-sdk-request': b'attempt=5; max=5', 'Content-Length': '0'}>
def send(self, request):
try:
proxy_url = self._proxy_config.proxy_url_for(request.url)
manager = self._get_connection_manager(request.url, proxy_url)
conn = manager.connection_from_url(request.url)
self._setup_ssl_cert(conn, request.url, self._verify)
if ensure_boolean(
os.environ.get('BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER', '')
):
# This is currently an "experimental" feature which provides
# no guarantees of backwards compatibility. It may be subject
# to change or removal in any patch version. Anyone opting in
# to this feature should strictly pin botocore.
host = urlparse(request.url).hostname
conn.proxy_headers['host'] = host
request_target = self._get_request_target(request.url, proxy_url)
urllib_response = conn.urlopen(
method=request.method,
url=request_target,
body=request.body,
headers=request.headers,
retries=Retry(False),
assert_same_host=False,
preload_content=False,
decode_content=False,
chunked=self._chunked(request.headers),
)
http_response = botocore.awsrequest.AWSResponse(
request.url,
urllib_response.status,
urllib_response.headers,
urllib_response,
)
if not request.stream_output:
# Cause the raw stream to be exhausted immediately. We do it
# this way instead of using preload_content because
# preload_content will never buffer chunked responses
http_response.content
return http_response
except URLLib3SSLError as e:
raise SSLError(endpoint_url=request.url, error=e)
except (NewConnectionError, socket.gaierror) as e:
raise EndpointConnectionError(endpoint_url=request.url, error=e)
E botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5000/csv"
/usr/local/lib/python3.8/dist-packages/botocore/httpsession.py:477: EndpointConnectionError
_____________________ test_cpu_workflow[True-True-parquet] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_pa0')
df = name-cat name-string id label x y
0 Edith Laura 1054 964 -0.792165 0.069362
...da 976 964 -0.270133 0.839677
4320 Alice Ray 967 977 0.033737 -0.727091
[4321 rows x 6 columns]
dataset = <merlin.io.dataset.Dataset object at 0x7fba407cecd0>, cpu = True
engine = 'parquet', dump = True
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
# Make sure we are in cpu formats
if cudf and isinstance(df, cudf.DataFrame):
df = df.to_pandas()
if cpu:
dataset.to_cpu()
cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
norms = ops.Normalize()
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
cats = cat_names >> ops.Categorify()
workflow = nvt.Workflow(conts + cats + label_name)
workflow.fit(dataset)
if dump:
workflow_dir = os.path.join(tmpdir, "workflow")
workflow.save(workflow_dir)
workflow = None
workflow = Workflow.load(workflow_dir)
def get_norms(tar: pd.Series):
df = tar.fillna(0)
df = df * (df >= 0).astype("int")
return df
assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)
# Check that categories match
if engine == "parquet":
cats_expected0 = df["name-cat"].unique()
cats0 = get_cats(workflow, "name-cat", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
cats_expected1 = df["name-string"].unique()
cats1 = get_cats(workflow, "name-string", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])
# Write to new "shuffled" and "processed" dataset
workflow.transform(dataset).to_parquet(
output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
)
dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)
tests/unit/workflow/test_cpu_workflow.py:76:
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init
self._path0,
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0
return next(self._dataset.get_fragments()).path
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset
dataset = pa_ds.dataset(paths, filesystem=fs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_pa0/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_pa0/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?
pyarrow/error.pxi:99: ArrowInvalid
_______________________ test_cpu_workflow[True-True-csv] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs0')
df = name-string id label x y
0 Laura 1054 964 -0.792165 0.069362
1 Laura ... Zelda 976 964 -0.270133 0.839677
2160 Ray 967 977 0.033737 -0.727091
[4321 rows x 5 columns]
dataset = <merlin.io.dataset.Dataset object at 0x7fba406e8610>, cpu = True
engine = 'csv', dump = True
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
# Make sure we are in cpu formats
if cudf and isinstance(df, cudf.DataFrame):
df = df.to_pandas()
if cpu:
dataset.to_cpu()
cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
norms = ops.Normalize()
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
cats = cat_names >> ops.Categorify()
workflow = nvt.Workflow(conts + cats + label_name)
workflow.fit(dataset)
if dump:
workflow_dir = os.path.join(tmpdir, "workflow")
workflow.save(workflow_dir)
workflow = None
workflow = Workflow.load(workflow_dir)
def get_norms(tar: pd.Series):
df = tar.fillna(0)
df = df * (df >= 0).astype("int")
return df
assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)
# Check that categories match
if engine == "parquet":
cats_expected0 = df["name-cat"].unique()
cats0 = get_cats(workflow, "name-cat", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
cats_expected1 = df["name-string"].unique()
cats1 = get_cats(workflow, "name-string", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])
# Write to new "shuffled" and "processed" dataset
workflow.transform(dataset).to_parquet(
output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
)
dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)
tests/unit/workflow/test_cpu_workflow.py:76:
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init
self._path0,
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0
return next(self._dataset.get_fragments()).path
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset
dataset = pa_ds.dataset(paths, filesystem=fs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs0/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs0/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?
pyarrow/error.pxi:99: ArrowInvalid
__________________ test_cpu_workflow[True-True-csv-no-header] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs1')
df = name-string id label x y
0 Laura 1054 964 -0.792165 0.069362
1 Laura ... Zelda 976 964 -0.270133 0.839677
2160 Ray 967 977 0.033737 -0.727091
[4321 rows x 5 columns]
dataset = <merlin.io.dataset.Dataset object at 0x7fba406e6250>, cpu = True
engine = 'csv-no-header', dump = True
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
# Make sure we are in cpu formats
if cudf and isinstance(df, cudf.DataFrame):
df = df.to_pandas()
if cpu:
dataset.to_cpu()
cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
norms = ops.Normalize()
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
cats = cat_names >> ops.Categorify()
workflow = nvt.Workflow(conts + cats + label_name)
workflow.fit(dataset)
if dump:
workflow_dir = os.path.join(tmpdir, "workflow")
workflow.save(workflow_dir)
workflow = None
workflow = Workflow.load(workflow_dir)
def get_norms(tar: pd.Series):
df = tar.fillna(0)
df = df * (df >= 0).astype("int")
return df
assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)
# Check that categories match
if engine == "parquet":
cats_expected0 = df["name-cat"].unique()
cats0 = get_cats(workflow, "name-cat", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
cats_expected1 = df["name-string"].unique()
cats1 = get_cats(workflow, "name-string", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])
# Write to new "shuffled" and "processed" dataset
workflow.transform(dataset).to_parquet(
output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
)
dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)
tests/unit/workflow/test_cpu_workflow.py:76:
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init
self._path0,
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0
return next(self._dataset.get_fragments()).path
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset
dataset = pa_ds.dataset(paths, filesystem=fs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs1/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_True_cs1/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?
pyarrow/error.pxi:99: ArrowInvalid
____________________ test_cpu_workflow[True-False-parquet] _____________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_p0')
df = name-cat name-string id label x y
0 Edith Laura 1054 964 -0.792165 0.069362
...da 976 964 -0.270133 0.839677
4320 Alice Ray 967 977 0.033737 -0.727091
[4321 rows x 6 columns]
dataset = <merlin.io.dataset.Dataset object at 0x7fba285b28b0>, cpu = True
engine = 'parquet', dump = False
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
# Make sure we are in cpu formats
if cudf and isinstance(df, cudf.DataFrame):
df = df.to_pandas()
if cpu:
dataset.to_cpu()
cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
norms = ops.Normalize()
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
cats = cat_names >> ops.Categorify()
workflow = nvt.Workflow(conts + cats + label_name)
workflow.fit(dataset)
if dump:
workflow_dir = os.path.join(tmpdir, "workflow")
workflow.save(workflow_dir)
workflow = None
workflow = Workflow.load(workflow_dir)
def get_norms(tar: pd.Series):
df = tar.fillna(0)
df = df * (df >= 0).astype("int")
return df
assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)
# Check that categories match
if engine == "parquet":
cats_expected0 = df["name-cat"].unique()
cats0 = get_cats(workflow, "name-cat", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
cats_expected1 = df["name-string"].unique()
cats1 = get_cats(workflow, "name-string", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])
# Write to new "shuffled" and "processed" dataset
workflow.transform(dataset).to_parquet(
output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
)
dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)
tests/unit/workflow/test_cpu_workflow.py:76:
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init
self._path0,
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0
return next(self._dataset.get_fragments()).path
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset
dataset = pa_ds.dataset(paths, filesystem=fs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_p0/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_p0/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?
pyarrow/error.pxi:99: ArrowInvalid
______________________ test_cpu_workflow[True-False-csv] _______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c0')
df = name-string id label x y
0 Laura 1054 964 -0.792165 0.069362
1 Laura ... Zelda 976 964 -0.270133 0.839677
2160 Ray 967 977 0.033737 -0.727091
[4321 rows x 5 columns]
dataset = <merlin.io.dataset.Dataset object at 0x7fba40730370>, cpu = True
engine = 'csv', dump = False
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
# Make sure we are in cpu formats
if cudf and isinstance(df, cudf.DataFrame):
df = df.to_pandas()
if cpu:
dataset.to_cpu()
cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
norms = ops.Normalize()
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
cats = cat_names >> ops.Categorify()
workflow = nvt.Workflow(conts + cats + label_name)
workflow.fit(dataset)
if dump:
workflow_dir = os.path.join(tmpdir, "workflow")
workflow.save(workflow_dir)
workflow = None
workflow = Workflow.load(workflow_dir)
def get_norms(tar: pd.Series):
df = tar.fillna(0)
df = df * (df >= 0).astype("int")
return df
assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)
# Check that categories match
if engine == "parquet":
cats_expected0 = df["name-cat"].unique()
cats0 = get_cats(workflow, "name-cat", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
cats_expected1 = df["name-string"].unique()
cats1 = get_cats(workflow, "name-string", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])
# Write to new "shuffled" and "processed" dataset
workflow.transform(dataset).to_parquet(
output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
)
dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)
tests/unit/workflow/test_cpu_workflow.py:76:
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init
self._path0,
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0
return next(self._dataset.get_fragments()).path
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset
dataset = pa_ds.dataset(paths, filesystem=fs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c0/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c0/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?
pyarrow/error.pxi:99: ArrowInvalid
_________________ test_cpu_workflow[True-False-csv-no-header] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c1')
df = name-string id label x y
0 Laura 1054 964 -0.792165 0.069362
1 Laura ... Zelda 976 964 -0.270133 0.839677
2160 Ray 967 977 0.033737 -0.727091
[4321 rows x 5 columns]
dataset = <merlin.io.dataset.Dataset object at 0x7fba586e2eb0>, cpu = True
engine = 'csv-no-header', dump = False
@pytest.mark.parametrize("engine", ["parquet", "csv", "csv-no-header"])
@pytest.mark.parametrize("dump", [True, False])
@pytest.mark.parametrize("cpu", [True])
def test_cpu_workflow(tmpdir, df, dataset, cpu, engine, dump):
# Make sure we are in cpu formats
if cudf and isinstance(df, cudf.DataFrame):
df = df.to_pandas()
if cpu:
dataset.to_cpu()
cat_names = ["name-cat", "name-string"] if engine == "parquet" else ["name-string"]
cont_names = ["x", "y", "id"]
label_name = ["label"]
norms = ops.Normalize()
conts = cont_names >> ops.FillMissing() >> ops.Clip(min_value=0) >> norms
cats = cat_names >> ops.Categorify()
workflow = nvt.Workflow(conts + cats + label_name)
workflow.fit(dataset)
if dump:
workflow_dir = os.path.join(tmpdir, "workflow")
workflow.save(workflow_dir)
workflow = None
workflow = Workflow.load(workflow_dir)
def get_norms(tar: pd.Series):
df = tar.fillna(0)
df = df * (df >= 0).astype("int")
return df
assert math.isclose(get_norms(df.x).mean(), norms.means["x"], rel_tol=1e-4)
assert math.isclose(get_norms(df.y).mean(), norms.means["y"], rel_tol=1e-4)
assert math.isclose(get_norms(df.x).std(), norms.stds["x"], rel_tol=1e-3)
assert math.isclose(get_norms(df.y).std(), norms.stds["y"], rel_tol=1e-3)
# Check that categories match
if engine == "parquet":
cats_expected0 = df["name-cat"].unique()
cats0 = get_cats(workflow, "name-cat", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected0.tolist()) for cat in cats0.tolist())
assert len(cats0.tolist()) == len(cats_expected0.tolist() + [None])
cats_expected1 = df["name-string"].unique()
cats1 = get_cats(workflow, "name-string", cpu=True)
# adding the None entry as a string because of move from gpu
assert all(cat in [None] + sorted(cats_expected1.tolist()) for cat in cats1.tolist())
assert len(cats1.tolist()) == len(cats_expected1.tolist() + [None])
# Write to new "shuffled" and "processed" dataset
workflow.transform(dataset).to_parquet(
output_path=tmpdir, out_files_per_proc=10, shuffle=nvt.io.Shuffle.PER_PARTITION
)
dataset_2 = Dataset(glob.glob(str(tmpdir) + "/*.parquet"), cpu=cpu)
tests/unit/workflow/test_cpu_workflow.py:76:
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:303: in init
self.engine = ParquetDatasetEngine(
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:313: in init
self._path0,
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:338: in _path0
return next(self._dataset.get_fragments()).path
/usr/local/lib/python3.8/dist-packages/merlin/io/parquet.py:365: in _dataset
dataset = pa_ds.dataset(paths, filesystem=fs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???
???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c1/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-14/test_cpu_workflow_True_False_c1/part_0.parquet': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.. Is this a 'parquet' file?
pyarrow/error.pxi:99: ArrowInvalid
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/usr/local/lib/python3.8/dist-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-True-device-0-csv-no-header-0.1]
FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-True-device-150-csv-no-header-0.1]
FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-False-None-0-csv-0.1]
FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-False-None-0-csv-no-header-0.1]
FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-False-None-150-csv-0.1]
FAILED tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-None-False-None-150-csv-no-header-0.1]
FAILED tests/unit/test_dask_nvt.py::test_dask_preproc_cpu[True-None-parquet]
FAILED tests/unit/test_dask_nvt.py::test_dask_preproc_cpu[True-None-csv] - py...
FAILED tests/unit/test_dask_nvt.py::test_dask_preproc_cpu[True-None-csv-no-header]
FAILED tests/unit/test_s3.py::test_s3_dataset[parquet] - botocore.exceptions....
FAILED tests/unit/test_s3.py::test_s3_dataset[csv] - botocore.exceptions.Endp...
FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-True-parquet]
FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-True-csv]
FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-True-csv-no-header]
FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-False-parquet]
FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-False-csv]
FAILED tests/unit/workflow/test_cpu_workflow.py::test_cpu_workflow[True-False-csv-no-header]
===== 17 failed, 1414 passed, 1 skipped, 617 warnings in 713.17s (0:11:53) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins9505466125844008201.sh
Click to view CI Results
GitHub pull request #1547 of commit 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd, no merge conflicts.
Running as SYSTEM
Setting status of 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4737/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
> git rev-parse 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd^{commit} # timeout=10
Checking out Revision 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd # timeout=10
Commit message: "Merge branch 'main' into main"
> git rev-list --no-walk 4fe1280dd723e58fa32bc5579eadce7148e1d42a # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins3615814383666057739.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+10.g2e73d5bc5.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.85,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.84,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@2e73d5bc5decc20505ee9d9e78990689b8e8c2dd#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.3,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='692881346'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-r5jilq00
Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-r5jilq00
Resolved https://github.com/NVIDIA-Merlin/core.git to commit 98cd36067d5ad9bb952aa2dbfac55eb059bb7bc4
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: betterproto=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (21.3)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (7.0.0)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (1.3.5)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (4.64.1)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (1.10.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (3.19.5)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.5.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (0.55.1)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+1.g98cd360) (0.4.3)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.12.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.4.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (3.1.2)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (6.1)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.4.0)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.0.0)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.4)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (8.1.3)
Requirement already satisfied: llvmlite=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (0.38.1)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (65.3.0)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+1.g98cd360) (3.0.9)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (2.8.2)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.52.0)
Requirement already satisfied: absl-py=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (6.0.2)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.1.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (4.0.0)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (6.0.1)
Building wheels for collected packages: merlin-core
Building wheel for merlin-core (pyproject.toml): started
Building wheel for merlin-core (pyproject.toml): finished with status 'done'
Created wheel for merlin-core: filename=merlin_core-0.7.0+1.g98cd360-py3-none-any.whl size=114014 sha256=df12df6ac1e572406abd8c885387a4211ea2c127768fde979586bc8fa70a4c12
Stored in directory: /tmp/pip-ephem-wheel-cache-dxtipq6p/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
Attempting uninstall: merlin-core
Found existing installation: merlin-core 0.3.0+12.g78ecddd
Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.7.0+1.g98cd360
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1433 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py .... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/examples/test_01-Getting-started.py . [ 12%]
tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%]
tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
............................................. [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 6 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(
tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover
merlin/transforms/init.py 1 1 0%
merlin/transforms/ops/init.py 1 1 0%
TOTAL 2 2 0%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto'
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: needs horovod
========== 1432 passed, 2 skipped, 258 warnings in 1123.71s (0:18:43) ==========
/usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected)
self._warn("No data was collected.", slug="no-data-collected")
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins8802569408947653966.sh
Click to view CI Results
GitHub pull request #1547 of commit 0f10b88cb6b55014f6e4359e3caf20881fc70b13, no merge conflicts.
Running as SYSTEM
Setting status of 0f10b88cb6b55014f6e4359e3caf20881fc70b13 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4738/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
> git rev-parse 0f10b88cb6b55014f6e4359e3caf20881fc70b13^{commit} # timeout=10
Checking out Revision 0f10b88cb6b55014f6e4359e3caf20881fc70b13 (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 0f10b88cb6b55014f6e4359e3caf20881fc70b13 # timeout=10
Commit message: "Merge branch 'main' into main"
> git rev-list --no-walk 2e73d5bc5decc20505ee9d9e78990689b8e8c2dd # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7147864933175555979.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+12.g0f10b88cb.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.85,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.84,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@0f10b88cb6b55014f6e4359e3caf20881fc70b13#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.3,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3722574718'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-f1gr5ow8
Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-f1gr5ow8
Resolved https://github.com/NVIDIA-Merlin/core.git to commit 98cd36067d5ad9bb952aa2dbfac55eb059bb7bc4
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.5.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (0.55.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (21.3)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (4.64.1)
Requirement already satisfied: betterproto=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (1.3.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (7.0.0)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (1.10.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (3.19.5)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+1.g98cd360) (0.4.3)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.4.1)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.12.0)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.8.0)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.7.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.4)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (8.1.3)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.4.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (6.1)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.0.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (3.1.2)
Requirement already satisfied: llvmlite=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (0.38.1)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (1.20.3)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (65.3.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+1.g98cd360) (3.0.9)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (2022.2.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (2.8.2)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.52.0)
Requirement already satisfied: absl-py=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.2.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.1)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (4.1.0)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (6.0.2)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.1.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (4.0.0)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+1.g98cd360) (6.0.1)
Building wheels for collected packages: merlin-core
Building wheel for merlin-core (pyproject.toml): started
Building wheel for merlin-core (pyproject.toml): finished with status 'done'
Created wheel for merlin-core: filename=merlin_core-0.7.0+1.g98cd360-py3-none-any.whl size=114014 sha256=3582ec663f3112230cfba9d03ff29e647ff00639cd69ec88ecd8fc6b3d325d85
Stored in directory: /tmp/pip-ephem-wheel-cache-rjwglxax/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
Attempting uninstall: merlin-core
Found existing installation: merlin-core 0.3.0+12.g78ecddd
Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.7.0+1.g98cd360
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1441 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py .... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/examples/test_01-Getting-started.py . [ 12%]
tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%]
tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
..................................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 46%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 6 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(
tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover
merlin/transforms/init.py 1 1 0%
merlin/transforms/ops/init.py 1 1 0%
TOTAL 2 2 0%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto'
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: needs horovod
========== 1440 passed, 2 skipped, 258 warnings in 1150.12s (0:19:10) ==========
/usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected)
self._warn("No data was collected.", slug="no-data-collected")
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins17135248576413537766.sh
@rnyak @karlhigley can you review this pull request?
Click to view CI Results
GitHub pull request #1547 of commit 56fb11a3ab27fe850c3374c1f344f81f614f667f, no merge conflicts.
Running as SYSTEM
Setting status of 56fb11a3ab27fe850c3374c1f344f81f614f667f to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4746/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
> git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1547/*:refs/remotes/origin/pr/1547/* # timeout=10
> git rev-parse 56fb11a3ab27fe850c3374c1f344f81f614f667f^{commit} # timeout=10
Checking out Revision 56fb11a3ab27fe850c3374c1f344f81f614f667f (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 56fb11a3ab27fe850c3374c1f344f81f614f667f # timeout=10
Commit message: "Merge branch 'main' into main"
> git rev-list --no-walk 3fb7db360ca92f7800f31f666b4d3ab56118fde9 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins611663645699638183.sh
GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py
test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+14.g56fb11a3a.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.90,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.89,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@56fb11a3ab27fe850c3374c1f344f81f614f667f#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.3,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='728129109'
test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/core.git
Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-s91s9ym0
Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-s91s9ym0
Resolved https://github.com/NVIDIA-Merlin/core.git to commit 14a18dc0de5d5fd7737ecbadf9f6d7fa5d801b67
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (7.0.0)
Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (0.55.1)
Requirement already satisfied: pandas=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (1.3.5)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (4.64.1)
Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.3.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (21.3)
Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.3.0)
Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+9.g14a18dc) (2022.5.0)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (3.19.5)
Requirement already satisfied: betterproto=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+9.g14a18dc) (1.10.0)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+9.g14a18dc) (0.4.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterprotomerlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (5.4.1)
Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (0.12.0)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.0.4)
Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (5.8.0)
Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (6.1)
Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.7.0)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (8.1.3)
Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.0.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (3.1.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.4.0)
Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (65.3.0)
Requirement already satisfied: llvmlite=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (0.38.1)
Requirement already satisfied: numpy=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+9.g14a18dc) (1.20.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+9.g14a18dc) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+9.g14a18dc) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas=1.2.0->merlin-core==0.7.0+9.g14a18dc) (2022.2.1)
Requirement already satisfied: absl-py=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.2.0)
Requirement already satisfied: googleapis-common-protos=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.52.0)
Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (0.2.1)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas=1.2.0->merlin-core==0.7.0+9.g14a18dc) (1.15.0)
Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+9.g14a18dc) (6.0.2)
Requirement already satisfied: h2=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterprotomerlin-core==0.7.0+9.g14a18dc) (4.1.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+9.g14a18dc) (2.1.1)
Requirement already satisfied: hpack=4.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+9.g14a18dc) (4.0.0)
Requirement already satisfied: hyperframe=6.0 in /usr/local/lib/python3.8/dist-packages (from h2=3.1.0->grpclib->betterprotomerlin-core==0.7.0+9.g14a18dc) (6.0.1)
Building wheels for collected packages: merlin-core
Building wheel for merlin-core (pyproject.toml): started
Building wheel for merlin-core (pyproject.toml): finished with status 'done'
Created wheel for merlin-core: filename=merlin_core-0.7.0+9.g14a18dc-py3-none-any.whl size=118253 sha256=8c004aad8cf77e2c39dc85818c1b6127b5fed865e68cf3f3e6344b094172c079
Stored in directory: /tmp/pip-ephem-wheel-cache-mm0otxlo/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84
Successfully built merlin-core
Installing collected packages: merlin-core
Attempting uninstall: merlin-core
Found existing installation: merlin-core 0.3.0+12.g78ecddd
Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu
Can't uninstall 'merlin-core'. No files were found to uninstall.
Successfully installed merlin-core-0.7.0+9.g14a18dc
test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0
collected 1441 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py .... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/examples/test_01-Getting-started.py . [ 12%]
tests/unit/examples/test_02-Advanced-NVTabular-workflow.py . [ 12%]
tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py . [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
..................................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 46%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)
.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)
nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(
tests/unit/test_dask_nvt.py: 6 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(
tests/unit/ops/test_ops_schema.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover
merlin/transforms/init.py 1 1 0%
merlin/transforms/ops/init.py 1 1 0%
TOTAL 2 2 0%
=========================== short test summary info ============================
SKIPPED [1] ../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/io/tests/test_s3.py:14: could not import 'moto': No module named 'moto'
SKIPPED [1] tests/unit/loader/test_tf_dataloader.py:529: needs horovod
========== 1440 passed, 2 skipped, 258 warnings in 1185.85s (0:19:45) ==========
/usr/local/lib/python3.8/dist-packages/coverage/control.py:801: CoverageWarning: No data was collected. (no-data-collected)
self._warn("No data was collected.", slug="no-data-collected")
/usr/local/lib/python3.8/dist-packages/coverage/data.py:130: CoverageWarning: Data file '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.coverage.10.20.17.231.6542.537287' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object
data._warn(str(exc))
/usr/local/lib/python3.8/dist-packages/coverage/data.py:130: CoverageWarning: Data file '/var/jenkins_home/workspace/nvtabular_tests/nvtabular/.coverage.10.20.17.231.6540.646720' doesn't seem to be a coverage data file: cannot unpack non-iterable NoneType object
data._warn(str(exc))
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins15406989966855063072.sh
Closing since this notebook has moved to another repo since the PR was opened