openshift-ml-workflows-workshop
openshift-ml-workflows-workshop copied to clipboard
Trouble loading training.parquet dataset
When attempting to run the 01-vectors-and-visualization
notebook, I ran into an issue on my environment where the parquet engine did not appear to be loading correctly.
Steps to reproduce:
- Using macOS 10.15.5, Zsh
- Followed the pre-install instructions in the readme:
brew install python
which installedpython-3.7.7.catalina.bottle.tar.gz
andbrew install pipenv
which installedpipenv-2020.6.2.catalina.bottle.tar.gz
- Cloned this repo, and changed current directory to repo root folder
- Ran the initialization
ipenv install --skip-lock
:
~/Development/ml-workflows-notebook develop ✔ 21d22h
▶ pipenv install --skip-lock
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
Creating a virtualenv for this project…
Pipfile: /Users/carl/Development/ml-workflows-notebook/Pipfile
Using /usr/local/bin/python3.7m (3.7.7) to create virtualenv…
⠇ Creating virtual environment...created virtual environment CPython3.7.7.final.0-64 in 571ms
creator CPython3Posix(dest=/Users/carl/.local/share/virtualenvs/ml-workflows-notebook-B_cU6XbN, clear=False, global=False)
seeder FromAppData(download=False, pip=latest, setuptools=latest, wheel=latest, via=copy, app_data_dir=/Users/carl/Library/Application Support/virtualenv/seed-app-data/v1.0.1)
activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
✔ Successfully created virtual environment!
Virtualenv location: /Users/carl/.local/share/virtualenvs/ml-workflows-notebook-B_cU6XbN
Installing dependencies from Pipfile…
🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 10/10 — 00:00:24
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
- Ran Jupyter:
▶ pipenv run jupyter notebook
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
[I 16:40:40.780 NotebookApp] Serving notebooks from local directory: /Users/carl/Development/ml-workflows-notebook
[I 16:40:40.780 NotebookApp] The Jupyter Notebook is running at:
[I 16:40:40.780 NotebookApp] http://localhost:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:40.780 NotebookApp] or http://127.0.0.1:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:40.780 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:40:40.786 NotebookApp]
To access the notebook, open this file in a browser:
file:///Users/carl/Library/Jupyter/runtime/nbserver-93704-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
or http://127.0.0.1:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
- Open the
01-vectors-and-visualization
notebook, select the first code block and run it (shift + enter):
I was able to work around the problem by manually installing pyarrow
:
- Stop the notebook app (ctrl + c)
- Manually install pyarrow:
▶ pipenv install pyarrow
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
Installing pyarrow…
Adding pyarrow to Pipfile's [packages]…
✔ Installation Succeeded
Pipfile.lock not found, creating…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Building requirements...
Resolving dependencies...
✔ Success!
Updated Pipfile.lock (e8ac95)!
Installing dependencies from Pipfile.lock (e8ac95)…
🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 4/4 — 00:00:00
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
- Start the notebook again:
▶ pipenv run jupyter notebook
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
[I 16:40:40.780 NotebookApp] Serving notebooks from local directory: /Users/carl/Development/ml-workflows-notebook
[I 16:40:40.780 NotebookApp] The Jupyter Notebook is running at:
[I 16:40:40.780 NotebookApp] http://localhost:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:40.780 NotebookApp] or http://127.0.0.1:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:40.780 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:40:40.786 NotebookApp]
To access the notebook, open this file in a browser:
file:///Users/carl/Library/Jupyter/runtime/nbserver-93704-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
or http://127.0.0.1:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:49.927 NotebookApp] Kernel started: 628eb2e7-4370-4e7f-b0be-96089fc84b4a
- Now it seems to load the training dataset correctly:
data:image/s3,"s3://crabby-images/ecbe8/ecbe8f45f379969adbf6bc48c8bb54e3caae8e54" alt="image"