MachineLearningNotebooks icon indicating copy to clipboard operation
MachineLearningNotebooks copied to clipboard

Python dataset support ubuntu 19/20

Open epa095 opened this issue 4 years ago • 31 comments

Issue: When attempting to download a dataset on ubuntu 19.10 I get NotImplementedError: Unsupported Linux distribution ubuntu 19.10.

It seems like the problem is that the dotnetcore2 pip package actually only supports ubuntu 18. But ubuntu 20.04 is the new LTS, so it makes sense to support it (and also ubuntu 19).

Also, can we agree that it is a bit of an architecture-smell when downlading some csv-files (the dataset) causes a dependency to go look for a distro-specitic tar-file for a custom installation of a third dependency? I don't know whats the best solution, but this cant be it.

Related: https://github.com/Azure/MachineLearningNotebooks/issues/713

epa095 avatar Apr 28 '20 12:04 epa095

@epa095 we will review your feedback and get back to you shortly. Thanks.

GiftA-MSFT avatar Apr 29 '20 00:04 GiftA-MSFT

Hi Erik,

Were you downloading an AML Dataset from AML workspace? Or were you downloading the CSV file? Could you help to provide more details about the interface you are using? Thanks!

SturgeonMi avatar Apr 29 '20 04:04 SturgeonMi

Hi @SturgeonMi ! I was attempting to follow along this tutorial on my ubuntu 19.10 linux, but I got the above mentioned problem when I got to the step "Download the MNIST dataset". It crashes on the step MNIST.get_file_dataset, because it ends up calling attemp_get_deps in the file runtime.py in the package dotnetcore2.

My relevant versions: dotnetcore2==2.1.13 azureml-opendatasets==1.4.0 azureml-sdk==1.4.0 azure-core==1.4.0 ubuntu 19.10

epa095 avatar Apr 29 '20 07:04 epa095

Thanks a lot, Erik! Opend a bug to track from AzureML side. Will get back to you about updates.

SturgeonMi avatar Apr 30 '20 22:04 SturgeonMi

Hi Erik,

We fixed related bug in Open Datasets SDK.

Could you help to try below steps?

Please ensure you are using the latest Azure Open Datasets SDK. You can get install the latest SDK by Running the following commands" !pip uninstall -y azureml-opendatasets !pip install azureml-opendatasets

Also here is the latest version of the tutorial notebook: https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb

Thanks!

SturgeonMi avatar May 01 '20 20:05 SturgeonMi

Hi @SturgeonMi, the latest version of azureml-opendatasets I see on pypi is 1.4.0, and as you can see from my previous comment that is the version I am already using.

epa095 avatar May 02 '20 06:05 epa095

Hi Erik, we opened a bug for dotnetcore2 issue. Once it's fixed, will update here.

SturgeonMi avatar May 04 '20 17:05 SturgeonMi

@epa095 hope the above solution helped. I will now proceed to close this thread. Let us know if you continue to encounter issues downloading the dataset. Thanks.

GiftA-MSFT avatar May 04 '20 23:05 GiftA-MSFT

Added a feature to support v19.

SturgeonMi avatar May 06 '20 23:05 SturgeonMi

@SturgeonMi thanks for opening an issue for me in dotnetcore2. Is there any way I can track it (i.e. is it publicly available in any way)?

epa095 avatar May 08 '20 13:05 epa095

reopening per new policy - is this fixed?

lostmygithubaccount avatar Feb 19 '21 13:02 lostmygithubaccount

apparently not, also having issues

gegnew avatar Jan 26 '22 22:01 gegnew

Hi @gegnew, are you still getting NotImplementedError: Unsupported Linux distribution ubuntu 19.10 when downloading a dataset on ubuntu 19.10? Or it's other error messages you are getting?

SturgeonMi avatar Jan 26 '22 23:01 SturgeonMi

Hi @SturgeonMi, I'm getting the errors reported in this issue, but have been totally unable to get any workaround to function. It's not precisely the same error, but afaict it's related.

gegnew avatar Jan 27 '22 11:01 gegnew

I'm on Arch, but installing the lttng modules doesn't resolve the missing dependency in the dotnet runtime

gegnew avatar Jan 27 '22 11:01 gegnew

Do you mind to provide more about what you were doing (what was the command you were using) when getting "NotImplementedError: Linux distribution arch . does not have automatic support. .NET Core 2.1 can still be used via dotnetcore2 if the required dependencies are installed. Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions. Follow your distro specific instructions to install dotnet-runtime-* and replace * with 2.1."?

SturgeonMi avatar Jan 28 '22 23:01 SturgeonMi

Hi, I am getting the same error. Providing the details on that below:

When is the error coming: When I try to load a azure dataset in local as a pandas dataframe. df = azure_workspace.datasets.get(dataset_name).to_pandas_dataframe()

Error Message: NotImplementedError: Linux distribution ubuntu 22.04 does not have automatic support. Missing packages: {'liblttng-ust.so.0'} .NET Core 3.1 can still be used via dotnetcore2 if the required dependencies are installed. Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions. Follow your distro specific instructions to install dotnet-runtime-* and replace * with 3.1.23.

My system details Distributor ID: Ubuntu Description: Ubuntu 22.04 LTS Release: 22.04 Codename: jammy dotnetcore2== 3.1.23 azureml-sdk==1.42.0 azureml-core==1.42.0.post1 azureml-opendatasets==1.42.0

What have I tried as a solution: Tried installing dotnet_runtime as mentioned in the error. Command: sudo apt-get install -y dotnet-runtime-3.1.23

result : E: Unable to locate package dotnet-runtime-3.1.23 E: Couldn't find any package by glob 'dotnet-runtime-3.1.23' E: Couldn't find any package by regex 'dotnet-runtime-3.1.23'

Please provide any solns/alternatives. Ultimately, I want to load an azure dataset in local, whichever way possible.

vighnesh-sablok avatar Jun 19 '22 11:06 vighnesh-sablok

I want to run a job on Azure ML (as a Docker container where I train my model). However, I keep getting this error when the job fails:

Traceback (most recent call last):
  File "train.py", line 5, in <module>
    train()
  File "/usr/local/lib/python3.9/site-packages/mlops_i4t/machine_learning/model_utils.py", line 56, in train
    df = dataset.to_pandas_dataframe()
  File "/usr/local/lib/python3.9/site-packages/azureml/data/_loggerfactory.py", line 132, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/azureml/data/tabular_dataset.py", line 168, in to_pandas_dataframe
    dataflow = get_dataflow_for_execution(self._dataflow, 'to_pandas_dataframe', 'TabularDataset')
  File "/usr/local/lib/python3.9/site-packages/azureml/data/_loggerfactory.py", line 132, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/azureml/data/abstract_dataset.py", line 221, in _dataflow
    dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
  File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/_datastore_helper.py", line 177, in _set_auth_type
    get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(auth_type, json.dumps(auth_value)))
  File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/api.py", line 19, in get_engine_api
    _engine_api = EngineAPI()
  File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/api.py", line 102, in __init__
    self._message_channel = launch_engine()
  File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/engine.py", line 333, in launch_engine
    dependencies_path = runtime.ensure_dependencies()
  File "/usr/local/lib/python3.9/site-packages/dotnetcore2/runtime.py", line 285, in ensure_dependencies
    if not attempt_get_deps(missing_pkgs):
  File "/usr/local/lib/python3.9/site-packages/dotnetcore2/runtime.py", line 279, in attempt_get_deps
    raise NotImplementedError(err_msg + '\n' + _unsupported_help_msg)
NotImplementedError: Linux distribution debian 11. does not have automatic support. 
Missing packages: {'libcurl.so.4', 'liblttng-ust.so.0'}
.NET Core 3.1 can still be used via `dotnetcore2` if the required dependencies are installed.
Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
Follow your distro specific instructions to install `dotnet-runtime-*` and replace `*` with `3.1.23`.

I am lost...what can I do to solve this?

NielsHoogeveen1990 avatar Oct 24 '22 13:10 NielsHoogeveen1990

Same Issue here with @NielsHoogeveen1990 stack trace. It can be reproduced with the the latest Ubuntu 22.04 MS Runner Image: https://github.com/actions/runner-images

kato-m avatar Jan 11 '23 07:01 kato-m

Experiencing same issue trying to consume a data asset registered in my AML workspace. Anyone able to resolve the "not supported... .NET Core" issue? Thanks

corticalstack avatar Mar 02 '23 14:03 corticalstack

Hi, debian 11 is not supported automatically. Could you try to install your Linux distro specific .NET Core based on guidance here https://learn.microsoft.com/en-us/dotnet/core/install/linux? Follow your distro specific instructions to install dotnet-runtime-* and replace * with 3.1.23. Thanks!

SturgeonMi avatar Mar 03 '23 01:03 SturgeonMi

Hi, debian 11 is not supported automatically. Could you try to install your Linux distro specific .NET Core based on guidance here https://learn.microsoft.com/en-us/dotnet/core/install/linux? Follow your distro specific instructions to install dotnet-runtime-* and replace * with 3.1.23. Thanks!

I have followed the instructions you recommended, and get same as reported by @vighnesh-sablok with "Unable to locate package dotnet-runtime-3.1.23"

corticalstack avatar Mar 03 '23 08:03 corticalstack

@SturgeonMi an easy way to replicate a test environment to get this error is to setup a devcontainer within vscode. If you could try follow the dotnet installation instructions for linux - I have not been able to get them working. Thank-you!

Example devcontainer.json

{
	"name": "my-aml-devcontainer",
        "build": { 
		"dockerfile": "Dockerfile"
	}
}

Example Dockerfile

FROM mcr.microsoft.com/vscode/devcontainers/base:ubuntu-22.04

# Install packages from standard package manager
RUN apt-get update -qq && export DEBIAN_FRONTEND=noninteractive && \
    apt-get install -y --no-install-recommends \
        software-properties-common \
        apt-transport-https \
        wget \
        curl \
        tar \
        zip \
        unzip \
        sudo \
        apt-utils \
        file \
        git \
        python3 \
        python3-pip \
        python3-setuptools \
        nano

# Python packages
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt

# Install Azure CLI and extensions
RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash \
    && az extension add -n ml -y

# Cleanup cached apt data
RUN apt-get autoremove -y && apt-get clean && \
    rm -rf /var/lib/apt/lists/*

CMD ["/bin/bash"]

Your requirements.txt would have python packages include azureml.core

Then a simple AML Python SDK v1 script

from azureml.core import Workspace, Dataset, Experiment, Model
import pandas as pd
import numpy as np
workspace = Workspace.from_config()
dataset_name = 'your dataset name here'
ds = Dataset.get_by_name(workspace=workspace, name=dataset_name)

corticalstack avatar Mar 04 '23 15:03 corticalstack

@SturgeonMi @corticalstack I'm facing the same issue. Is there any update?

ghost avatar Mar 29 '23 18:03 ghost

Hi,

Same issue here, I was using an ubuntu 20.04 image with sdk 1.48 and it was working but when bumping to 22.04 it doesnt work any longer.

My base image is:

mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04

edgBR avatar May 30 '23 20:05 edgBR

Same issue here - all my azure ml cluster runs blow up because of this, when trying to use this as the base docker image of my environment:

https://github.com/Azure/AzureML-Containers/tree/master/base/gpu/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04

So: microsoft provided docker images won't work in microsoft azure ml clusters using microsoft azure ml APIs --> a major incompatibility within microsoft products.

microsoft-sampsa avatar Sep 19 '23 07:09 microsoft-sampsa

Any news on this? I am having the same dotnet error in this Ubuntu version when trying to use the lib "azureml-dataset-runtime".

henrydleao avatar Oct 24 '23 14:10 henrydleao

Ubuntu version 14, 16,18, 20 are supported by "azureml-dataset-runtime". The package has a dependency on dotnetcore and that brings the restriction. We will publish a version 5.0.0 without dotnetcore dependency in the coming weeks. And that should resolve this issue.

SturgeonMi avatar Nov 03 '23 17:11 SturgeonMi

What about suport for Ubuntu 22 @SturgeonMi?

jmwoloso avatar Nov 13 '23 23:11 jmwoloso

We plan to publish a newer package version without dotnetcore dependency in the coming weeks. This should resolve the "Unsupported Linux distribution ubuntu" issue. @anliakho2 can provide more details here.

SturgeonMi avatar Nov 14 '23 06:11 SturgeonMi