mslearn-introduction-to-machine-learning icon indicating copy to clipboard operation
mslearn-introduction-to-machine-learning copied to clipboard

Learning module has OUTDATED `dask` package version and fails during first run

Open richlysakowski opened this issue 1 year ago • 2 comments

(one of the) Training Module where problem exists:

https://learn.microsoft.com/en-us/training/modules/introduction-to-data-for-machine-learning/3-exercise-detect-visualize-missing-data

Please fix the sandbox creation script, test the training course notebooks and post updated code.
The installation script for the Microsoft VM sandbox needs to be updated to reflect updates in plotly and xarray. BELOW is a HACK to fix the notebook and get it running.

Microsoft's custom module 'graphing.py' extracts code snippets from Plotly, Dash, and XArray. These packages have since been updated and throw errors. It took a bit of work to find a fix and test it in the environment.

I had this problem with another training module in the same course. Lost 30+ minutes fixing it the first time. Don't have time to debug and fix it again right now. Will post updated notebooks with VM changes that show how to fix notebook and environment.

Here is the cell that throws the error. [7] import graphing

'graphing' is custom code we use to make graphs quickly. If you would like to read it in detail, it can be found in our GitHub repository graphing.histogram(dataset, 'Pclass', title='Ticket Class (All Passengers)', show=True) graphing.histogram(unknown_age_and_cabin, 'Pclass', title='Ticket Class (Passengers Missing Cabin and Age Information)') 5 sec AttributeError: module 'dask.array' has no attribute 'lib'

AttributeError Traceback (most recent call last) Input In [7], in <cell line: 1>() ----> 1 import graphing 3 # 'graphing' is custom code we use to make graphs quickly. 4 # If you would like to read it in detail, it can be found 5 # in our GitHub repository 6 graphing.histogram(dataset, 'Pclass', title='Ticket Class (All Passengers)', show=True)

File /learn/graphing.py:9, in 7 from numpy.core.fromnumeric import repeat, shape 8 import pandas ----> 9 import plotly.express as px 10 import plotly.io as pio 11 import plotly.graph_objects as graph_objects

File /anaconda/envs/py38_default/lib/python3.8/site-packages/plotly/express/init.py:15, in 9 if pd is None: 10 raise ImportError( 11 """
12 Plotly express requires pandas to be installed.""" 13 ) ---> 15 from ._imshow import imshow 16 from ._chart_types import ( # noqa: F401 17 scatter, 18 scatter_3d, (...) 51 density_mapbox, 52 ) 55 from ._core import ( # noqa: F401 56 set_mapbox_access_token, 57 defaults, 58 get_trendline_results, 59 NO_COLOR, 60 )

File /anaconda/envs/py38_default/lib/python3.8/site-packages/plotly/express/_imshow.py:11, in 8 from plotly.utils import image_array_to_data_uri 10 try: ---> 11 import xarray 13 xarray_imported = True 14 except ImportError:

File /anaconda/envs/py38_default/lib/python3.8/site-packages/xarray/init.py:1, in ----> 1 from . import testing, tutorial 2 from .backends.api import ( 3 load_dataarray, 4 load_dataset, (...) 8 save_mfdataset, 9 ) 10 from .backends.rasterio_ import open_rasterio

File /anaconda/envs/py38_default/lib/python3.8/site-packages/xarray/testing.py:9, in 6 import numpy as np 7 import pandas as pd ----> 9 from xarray.core import duck_array_ops, formatting, utils 10 from xarray.core.dataarray import DataArray 11 from xarray.core.dataset import Dataset

File /anaconda/envs/py38_default/lib/python3.8/site-packages/xarray/core/duck_array_ops.py:26, in 23 from numpy import take, tensordot, transpose, unravel_index # noqa 24 from numpy import where as _where ---> 26 from . import dask_array_compat, dask_array_ops, dtypes, npcompat, nputils 27 from .nputils import nanfirst, nanlast 28 from .pycompat import cupy_array_type, dask_array_type, is_duck_dask_array

File /anaconda/envs/py38_default/lib/python3.8/site-packages/xarray/core/dask_array_compat.py:60, in 56 return padded 59 if da is not None: ---> 60 sliding_window_view = da.lib.stride_tricks.sliding_window_view 61 else: 62 sliding_window_view = None

AttributeError: module 'dask.array' has no attribute 'lib' Azure_Intro_ML_JupyterNBs_with-Graphing_Module-Errors.zip

richlysakowski avatar Jan 04 '23 18:01 richlysakowski

3-5-exercise-normalize-data-predict-missing-values-WITH-XARRAY-UPDATE-HACK.zip Here is a conda environment HACK (in the notebook) to fix the the environment and get the notebook running.

The sandbox build script for the Microsoft environment needs to be updated to reflect recent updates in plotly and xarray.

richlysakowski avatar Jan 04 '23 19:01 richlysakowski

Hey @richlysakowski, thanks a lot for taking the time to figure this out. For those that might be here looking for a quick and concise solution, I would just add that based on your solution above, all I had to do to get around this issue is add the line !pip install -U xarray!='2022.6.*' at the top of the first cell of any notebook that contains the error related to the graphing.py import.

courtneyum avatar Jan 18 '23 16:01 courtneyum

Hey @richlysakowski, thanks a lot for taking the time to figure this out. For those that might be here looking for a quick and concise solution, I would just add that based on your solution above, all I had to do to get around this issue is add the line !pip install -U xarray!='2022.6.*' at the top of the first cell of any notebook that contains the error related to the graphing.py import.

That is not working anymore unfortunately.

Boruc04 avatar May 19 '24 13:05 Boruc04