openshift-ml-workflows-workshop icon indicating copy to clipboard operation
openshift-ml-workflows-workshop copied to clipboard

Trouble loading training.parquet dataset

Open carlmes opened this issue 4 years ago • 1 comments

When attempting to run the 01-vectors-and-visualization notebook, I ran into an issue on my environment where the parquet engine did not appear to be loading correctly.

Steps to reproduce:

  1. Using macOS 10.15.5, Zsh
  2. Followed the pre-install instructions in the readme: brew install python which installed python-3.7.7.catalina.bottle.tar.gz and brew install pipenv which installed pipenv-2020.6.2.catalina.bottle.tar.gz
  3. Cloned this repo, and changed current directory to repo root folder
  4. Ran the initialization ipenv install --skip-lock:
~/Development/ml-workflows-notebook  develop ✔                                                                                                                                                                     21d22h  
▶ pipenv install --skip-lock
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
Creating a virtualenv for this project…
Pipfile: /Users/carl/Development/ml-workflows-notebook/Pipfile
Using /usr/local/bin/python3.7m (3.7.7) to create virtualenv…
⠇ Creating virtual environment...created virtual environment CPython3.7.7.final.0-64 in 571ms
  creator CPython3Posix(dest=/Users/carl/.local/share/virtualenvs/ml-workflows-notebook-B_cU6XbN, clear=False, global=False)
  seeder FromAppData(download=False, pip=latest, setuptools=latest, wheel=latest, via=copy, app_data_dir=/Users/carl/Library/Application Support/virtualenv/seed-app-data/v1.0.1)
  activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator

✔ Successfully created virtual environment! 
Virtualenv location: /Users/carl/.local/share/virtualenvs/ml-workflows-notebook-B_cU6XbN
Installing dependencies from Pipfile…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 10/10 — 00:00:24
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
  1. Ran Jupyter:
▶ pipenv run jupyter notebook
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
[I 16:40:40.780 NotebookApp] Serving notebooks from local directory: /Users/carl/Development/ml-workflows-notebook
[I 16:40:40.780 NotebookApp] The Jupyter Notebook is running at:
[I 16:40:40.780 NotebookApp] http://localhost:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:40.780 NotebookApp]  or http://127.0.0.1:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:40.780 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:40:40.786 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///Users/carl/Library/Jupyter/runtime/nbserver-93704-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
     or http://127.0.0.1:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
  1. Open the 01-vectors-and-visualization notebook, select the first code block and run it (shift + enter): image

carlmes avatar Jun 17 '20 22:06 carlmes

I was able to work around the problem by manually installing pyarrow:

  1. Stop the notebook app (ctrl + c)
  2. Manually install pyarrow:
▶ pipenv install pyarrow
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
Installing pyarrow…
Adding pyarrow to Pipfile's [packages]…
✔ Installation Succeeded 
Pipfile.lock not found, creating…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Building requirements...
Resolving dependencies...
✔ Success! 
Updated Pipfile.lock (e8ac95)!
Installing dependencies from Pipfile.lock (e8ac95)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 4/4 — 00:00:00
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
  1. Start the notebook again:
▶ pipenv run jupyter notebook
Courtesy Notice: Pipenv found itself running within a virtual environment, so it will automatically use that environment, instead of creating its own for any project. You can set PIPENV_IGNORE_VIRTUALENVS=1 to force pipenv to ignore that environment and create its own instead. You can set PIPENV_VERBOSITY=-1 to suppress this warning.
[I 16:40:40.780 NotebookApp] Serving notebooks from local directory: /Users/carl/Development/ml-workflows-notebook
[I 16:40:40.780 NotebookApp] The Jupyter Notebook is running at:
[I 16:40:40.780 NotebookApp] http://localhost:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:40.780 NotebookApp]  or http://127.0.0.1:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:40.780 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:40:40.786 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///Users/carl/Library/Jupyter/runtime/nbserver-93704-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
     or http://127.0.0.1:8888/?token=174bd2b70cb82569facbbd57a47bbae470967f29952e63a1
[I 16:40:49.927 NotebookApp] Kernel started: 628eb2e7-4370-4e7f-b0be-96089fc84b4a
  1. Now it seems to load the training dataset correctly:
image

carlmes avatar Jun 17 '20 22:06 carlmes