MachineLearningNotebooks
MachineLearningNotebooks copied to clipboard
Python dataset support ubuntu 19/20
Issue: When attempting to download a dataset on ubuntu 19.10 I get NotImplementedError: Unsupported Linux distribution ubuntu 19.10
.
It seems like the problem is that the dotnetcore2 pip package actually only supports ubuntu 18. But ubuntu 20.04 is the new LTS, so it makes sense to support it (and also ubuntu 19).
Also, can we agree that it is a bit of an architecture-smell when downlading some csv-files (the dataset) causes a dependency to go look for a distro-specitic tar-file for a custom installation of a third dependency? I don't know whats the best solution, but this cant be it.
Related: https://github.com/Azure/MachineLearningNotebooks/issues/713
@epa095 we will review your feedback and get back to you shortly. Thanks.
Hi Erik,
Were you downloading an AML Dataset from AML workspace? Or were you downloading the CSV file? Could you help to provide more details about the interface you are using? Thanks!
Hi @SturgeonMi !
I was attempting to follow along this tutorial on my ubuntu 19.10 linux, but I got the above mentioned problem when I got to the step "Download the MNIST dataset". It crashes on the step MNIST.get_file_dataset
, because it ends up calling attemp_get_deps
in the file runtime.py
in the package dotnetcore2
.
My relevant versions: dotnetcore2==2.1.13 azureml-opendatasets==1.4.0 azureml-sdk==1.4.0 azure-core==1.4.0 ubuntu 19.10
Thanks a lot, Erik! Opend a bug to track from AzureML side. Will get back to you about updates.
Hi Erik,
We fixed related bug in Open Datasets SDK.
Could you help to try below steps?
Please ensure you are using the latest Azure Open Datasets SDK. You can get install the latest SDK by Running the following commands" !pip uninstall -y azureml-opendatasets !pip install azureml-opendatasets
Also here is the latest version of the tutorial notebook: https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb
Thanks!
Hi @SturgeonMi, the latest version of azureml-opendatasets I see on pypi is 1.4.0, and as you can see from my previous comment that is the version I am already using.
Hi Erik, we opened a bug for dotnetcore2 issue. Once it's fixed, will update here.
@epa095 hope the above solution helped. I will now proceed to close this thread. Let us know if you continue to encounter issues downloading the dataset. Thanks.
Added a feature to support v19.
@SturgeonMi thanks for opening an issue for me in dotnetcore2
. Is there any way I can track it (i.e. is it publicly available in any way)?
reopening per new policy - is this fixed?
apparently not, also having issues
Hi @gegnew, are you still getting NotImplementedError: Unsupported Linux distribution ubuntu 19.10 when downloading a dataset on ubuntu 19.10? Or it's other error messages you are getting?
Hi @SturgeonMi, I'm getting the errors reported in this issue, but have been totally unable to get any workaround to function. It's not precisely the same error, but afaict it's related.
I'm on Arch, but installing the lttng modules doesn't resolve the missing dependency in the dotnet runtime
Do you mind to provide more about what you were doing (what was the command you were using) when getting "NotImplementedError: Linux distribution arch . does not have automatic support.
.NET Core 2.1 can still be used via dotnetcore2 if the required dependencies are installed.
Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
Follow your distro specific instructions to install dotnet-runtime-*
and replace *
with 2.1
."?
Hi, I am getting the same error. Providing the details on that below:
When is the error coming: When I try to load a azure dataset in local as a pandas dataframe. df = azure_workspace.datasets.get(dataset_name).to_pandas_dataframe()
Error Message:
NotImplementedError: Linux distribution ubuntu 22.04 does not have automatic support.
Missing packages: {'liblttng-ust.so.0'}
.NET Core 3.1 can still be used via dotnetcore2
if the required dependencies are installed.
Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
Follow your distro specific instructions to install dotnet-runtime-*
and replace *
with 3.1.23
.
My system details Distributor ID: Ubuntu Description: Ubuntu 22.04 LTS Release: 22.04 Codename: jammy dotnetcore2== 3.1.23 azureml-sdk==1.42.0 azureml-core==1.42.0.post1 azureml-opendatasets==1.42.0
What have I tried as a solution: Tried installing dotnet_runtime as mentioned in the error. Command: sudo apt-get install -y dotnet-runtime-3.1.23
result : E: Unable to locate package dotnet-runtime-3.1.23 E: Couldn't find any package by glob 'dotnet-runtime-3.1.23' E: Couldn't find any package by regex 'dotnet-runtime-3.1.23'
Please provide any solns/alternatives. Ultimately, I want to load an azure dataset in local, whichever way possible.
I want to run a job on Azure ML (as a Docker container where I train my model). However, I keep getting this error when the job fails:
Traceback (most recent call last):
File "train.py", line 5, in <module>
train()
File "/usr/local/lib/python3.9/site-packages/mlops_i4t/machine_learning/model_utils.py", line 56, in train
df = dataset.to_pandas_dataframe()
File "/usr/local/lib/python3.9/site-packages/azureml/data/_loggerfactory.py", line 132, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/azureml/data/tabular_dataset.py", line 168, in to_pandas_dataframe
dataflow = get_dataflow_for_execution(self._dataflow, 'to_pandas_dataframe', 'TabularDataset')
File "/usr/local/lib/python3.9/site-packages/azureml/data/_loggerfactory.py", line 132, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/azureml/data/abstract_dataset.py", line 221, in _dataflow
dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/_datastore_helper.py", line 177, in _set_auth_type
get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(auth_type, json.dumps(auth_value)))
File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/api.py", line 19, in get_engine_api
_engine_api = EngineAPI()
File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/api.py", line 102, in __init__
self._message_channel = launch_engine()
File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/engine.py", line 333, in launch_engine
dependencies_path = runtime.ensure_dependencies()
File "/usr/local/lib/python3.9/site-packages/dotnetcore2/runtime.py", line 285, in ensure_dependencies
if not attempt_get_deps(missing_pkgs):
File "/usr/local/lib/python3.9/site-packages/dotnetcore2/runtime.py", line 279, in attempt_get_deps
raise NotImplementedError(err_msg + '\n' + _unsupported_help_msg)
NotImplementedError: Linux distribution debian 11. does not have automatic support.
Missing packages: {'libcurl.so.4', 'liblttng-ust.so.0'}
.NET Core 3.1 can still be used via `dotnetcore2` if the required dependencies are installed.
Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
Follow your distro specific instructions to install `dotnet-runtime-*` and replace `*` with `3.1.23`.
I am lost...what can I do to solve this?
Same Issue here with @NielsHoogeveen1990 stack trace. It can be reproduced with the the latest Ubuntu 22.04 MS Runner Image: https://github.com/actions/runner-images
Experiencing same issue trying to consume a data asset registered in my AML workspace. Anyone able to resolve the "not supported... .NET Core" issue? Thanks
Hi, debian 11 is not supported automatically. Could you try to install your Linux distro specific .NET Core based on guidance here https://learn.microsoft.com/en-us/dotnet/core/install/linux?
Follow your distro specific instructions to install dotnet-runtime-*
and replace *
with 3.1.23
.
Thanks!
Hi, debian 11 is not supported automatically. Could you try to install your Linux distro specific .NET Core based on guidance here https://learn.microsoft.com/en-us/dotnet/core/install/linux? Follow your distro specific instructions to install
dotnet-runtime-*
and replace*
with3.1.23
. Thanks!
I have followed the instructions you recommended, and get same as reported by @vighnesh-sablok with "Unable to locate package dotnet-runtime-3.1.23"
@SturgeonMi an easy way to replicate a test environment to get this error is to setup a devcontainer within vscode. If you could try follow the dotnet installation instructions for linux - I have not been able to get them working. Thank-you!
Example devcontainer.json
{
"name": "my-aml-devcontainer",
"build": {
"dockerfile": "Dockerfile"
}
}
Example Dockerfile
FROM mcr.microsoft.com/vscode/devcontainers/base:ubuntu-22.04
# Install packages from standard package manager
RUN apt-get update -qq && export DEBIAN_FRONTEND=noninteractive && \
apt-get install -y --no-install-recommends \
software-properties-common \
apt-transport-https \
wget \
curl \
tar \
zip \
unzip \
sudo \
apt-utils \
file \
git \
python3 \
python3-pip \
python3-setuptools \
nano
# Python packages
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
# Install Azure CLI and extensions
RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash \
&& az extension add -n ml -y
# Cleanup cached apt data
RUN apt-get autoremove -y && apt-get clean && \
rm -rf /var/lib/apt/lists/*
CMD ["/bin/bash"]
Your requirements.txt would have python packages include azureml.core
Then a simple AML Python SDK v1 script
from azureml.core import Workspace, Dataset, Experiment, Model
import pandas as pd
import numpy as np
workspace = Workspace.from_config()
dataset_name = 'your dataset name here'
ds = Dataset.get_by_name(workspace=workspace, name=dataset_name)
@SturgeonMi @corticalstack I'm facing the same issue. Is there any update?
Hi,
Same issue here, I was using an ubuntu 20.04 image with sdk 1.48 and it was working but when bumping to 22.04 it doesnt work any longer.
My base image is:
mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04
Same issue here - all my azure ml cluster runs blow up because of this, when trying to use this as the base docker image of my environment:
https://github.com/Azure/AzureML-Containers/tree/master/base/gpu/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04
So: microsoft provided docker images won't work in microsoft azure ml clusters using microsoft azure ml APIs --> a major incompatibility within microsoft products.
Any news on this? I am having the same dotnet error in this Ubuntu version when trying to use the lib "azureml-dataset-runtime".
Ubuntu version 14, 16,18, 20 are supported by "azureml-dataset-runtime". The package has a dependency on dotnetcore and that brings the restriction. We will publish a version 5.0.0 without dotnetcore dependency in the coming weeks. And that should resolve this issue.
What about suport for Ubuntu 22 @SturgeonMi?
We plan to publish a newer package version without dotnetcore dependency in the coming weeks. This should resolve the "Unsupported Linux distribution ubuntu" issue. @anliakho2 can provide more details here.