mslearn-introduction-to-machine-learning
mslearn-introduction-to-machine-learning copied to clipboard
Learning module has OUTDATED `dask` package version and fails during first run
(one of the) Training Module where problem exists:
https://learn.microsoft.com/en-us/training/modules/introduction-to-data-for-machine-learning/3-exercise-detect-visualize-missing-data
Please fix the sandbox creation script, test the training course notebooks and post updated code.
The installation script for the Microsoft VM sandbox needs to be updated to reflect updates in plotly and xarray.
BELOW is a HACK to fix the notebook and get it running.
Microsoft's custom module 'graphing.py' extracts code snippets from Plotly, Dash, and XArray. These packages have since been updated and throw errors. It took a bit of work to find a fix and test it in the environment.
I had this problem with another training module in the same course. Lost 30+ minutes fixing it the first time. Don't have time to debug and fix it again right now. Will post updated notebooks with VM changes that show how to fix notebook and environment.
Here is the cell that throws the error. [7] import graphing
'graphing' is custom code we use to make graphs quickly. If you would like to read it in detail, it can be found in our GitHub repository graphing.histogram(dataset, 'Pclass', title='Ticket Class (All Passengers)', show=True) graphing.histogram(unknown_age_and_cabin, 'Pclass', title='Ticket Class (Passengers Missing Cabin and Age Information)') 5 sec AttributeError: module 'dask.array' has no attribute 'lib'
AttributeError Traceback (most recent call last) Input In [7], in <cell line: 1>() ----> 1 import graphing 3 # 'graphing' is custom code we use to make graphs quickly. 4 # If you would like to read it in detail, it can be found 5 # in our GitHub repository 6 graphing.histogram(dataset, 'Pclass', title='Ticket Class (All Passengers)', show=True)
File /learn/graphing.py:9, in
File /anaconda/envs/py38_default/lib/python3.8/site-packages/plotly/express/init.py:15, in
12 Plotly express requires pandas to be installed."""
13 )
---> 15 from ._imshow import imshow
16 from ._chart_types import ( # noqa: F401
17 scatter,
18 scatter_3d,
(...)
51 density_mapbox,
52 )
55 from ._core import ( # noqa: F401
56 set_mapbox_access_token,
57 defaults,
58 get_trendline_results,
59 NO_COLOR,
60 )
File /anaconda/envs/py38_default/lib/python3.8/site-packages/plotly/express/_imshow.py:11, in
File /anaconda/envs/py38_default/lib/python3.8/site-packages/xarray/init.py:1, in
File /anaconda/envs/py38_default/lib/python3.8/site-packages/xarray/testing.py:9, in
File /anaconda/envs/py38_default/lib/python3.8/site-packages/xarray/core/duck_array_ops.py:26, in
File /anaconda/envs/py38_default/lib/python3.8/site-packages/xarray/core/dask_array_compat.py:60, in
AttributeError: module 'dask.array' has no attribute 'lib' Azure_Intro_ML_JupyterNBs_with-Graphing_Module-Errors.zip
3-5-exercise-normalize-data-predict-missing-values-WITH-XARRAY-UPDATE-HACK.zip Here is a conda environment HACK (in the notebook) to fix the the environment and get the notebook running.
The sandbox build script for the Microsoft environment needs to be updated to reflect recent updates in plotly and xarray.
Hey @richlysakowski, thanks a lot for taking the time to figure this out. For those that might be here looking for a quick and concise solution, I would just add that based on your solution above, all I had to do to get around this issue is add the line !pip install -U xarray!='2022.6.*'
at the top of the first cell of any notebook that contains the error related to the graphing.py import.
Hey @richlysakowski, thanks a lot for taking the time to figure this out. For those that might be here looking for a quick and concise solution, I would just add that based on your solution above, all I had to do to get around this issue is add the line
!pip install -U xarray!='2022.6.*'
at the top of the first cell of any notebook that contains the error related to the graphing.py import.
That is not working anymore unfortunately.