MachineLearningNotebooks
MachineLearningNotebooks copied to clipboard
ML Pipelines DatabricksStep doesn't support Run.get_context()
While using a DatabricksStep I want to get the appropriate Run context so that I can log information to Azure ML.
I thought this might work:
- Authenticate with AzureMLTokenAuthentication
- Get an authenticated Workspace -> Experiment -> Run
- Use the Run.get_context()
Although I can't use the AzureMLTokenAuthentication to authenticate the Workspace.
Can you please provide more information on Azure's intention on how AzureMLTokenAuthentication should even be used?
The only way I've been able to get a run_context from within Databricks during a DatabricksStep is:
os.environ['AZUREML_RUN_TOKEN'] = AZUREML_RUN_TOKEN
os.environ['AZUREML_RUN_TOKEN_EXPIRY'] = AZUREML_RUN_TOKEN_EXPIRY
os.environ['AZUREML_RUN_ID'] = AZUREML_RUN_ID
os.environ['AZUREML_ARM_SUBSCRIPTION'] = AZUREML_ARM_SUBSCRIPTION
os.environ['AZUREML_ARM_RESOURCEGROUP'] = AZUREML_ARM_RESOURCEGROUP
os.environ['AZUREML_ARM_WORKSPACE_NAME'] = AZUREML_ARM_WORKSPACE_NAME
os.environ['AZUREML_ARM_PROJECT_NAME'] = AZUREML_ARM_PROJECT_NAME
os.environ['AZUREML_SERVICE_ENDPOINT'] = AZUREML_SERVICE_ENDPOINT
run = Run.get_context(allow_offline=False)
Which feels like a hack. How does Azure suggest that someone does this.
Any help is appreciated.
Noel
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
- ID: 4f99a518-b801-fb1c-4f6a-4a520be5aecc
- Version Independent ID: abc91e79-e6ef-801d-1883-4e0fba418317
- Content: azureml.core.authentication.AzureMLTokenAuthentication class - Azure Machine Learning Python
- Content Source: AzureML-Docset/stable/docs-ref-autogen/azureml-core/azureml.core.authentication.AzureMLTokenAuthentication.yml
- Service: machine-learning
- Sub-service: core
- GitHub Login: @debfro
- Microsoft Alias: jmartens
@RothNRK The sample notebook for authentication steps that can be used are available here. CLI or service principal authentication can be used to get your workspace details
@aashishb Could you please advise?
@RohitMungi-MSFT Thank you for your response.
CLI Auth doesn't look like it meets my needs.
Using a Service Principal would work but it requires:
- A Service Principal.
- A key-vault mounted as a Secret Scope in Databricks. This would be good if it were my only option but Azure ML is kind enough to send info to make a secure connection and log to Azure ML using the method I outlined above (which is a bit of a hack but certainly less effort than the Service Principal method).
How do we meaningfully use AzureMLTokenAuthentication (The notebook you linked doesn't include an example with AzureMLTokenAuthentication).
Thank you
@RothNRK The AzureMLTokenAuthentication is intended as Azure ML internal API.
Run.get_context() should simply work when you're within a context of Azure ML run, without any extra authentication required.
@rastala Thank you. That is what I expected but while in a DatabricksStep
it doesn't seem to simply work.
Run.get_context(allow_offline=False)
produces:
KeyError Traceback (most recent call last)
/databricks/python/lib/python3.7/site-packages/azureml/core/run.py in _load_scope(cls)
215 # Load authentication scope environment variables
--> 216 subscription_id = os.environ['AZUREML_ARM_SUBSCRIPTION']
217 run_id = os.environ["AZUREML_RUN_ID"]
/local_disk0/pythonVirtualEnvDirs/virtualEnv-6c0c6c97-5068-4f34-91f1-ded12db18057/lib/python3.7/os.py in __getitem__(self, key)
677 # raise KeyError with the original key value
--> 678 raise KeyError(key) from None
679 return self.decodevalue(value)
KeyError: 'AZUREML_ARM_SUBSCRIPTION'
The above exception was the direct cause of the following exception:
RunEnvironmentException Traceback (most recent call last)
/databricks/python/lib/python3.7/site-packages/azureml/core/run.py in get_context(cls, allow_offline, used_for_context_manager, **kwargs)
291 try:
--> 292 experiment, run_id = cls._load_scope()
293
/databricks/python/lib/python3.7/site-packages/azureml/core/run.py in _load_scope(cls)
232 except KeyError as key_error:
--> 233 raise_from(RunEnvironmentException(), key_error)
234 else:
/databricks/python/lib/python3.7/site-packages/six.py in raise_from(value, from_value)
RunEnvironmentException: RunEnvironmentException:
Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run.
InnerException None
ErrorResponse
{
"error": {
"message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run."
}
}
During handling of the above exception, another exception occurred:
RunEnvironmentException Traceback (most recent call last)
<command--1> in <module>
13
14 with open(filename, "rb") as f:
---> 15 exec(f.read())
16
<string> in <module>
<string> in main()
<string> in _main(args)
<string> in _get_run_context(args)
<string> in _get_run()
/databricks/python/lib/python3.7/site-packages/azureml/core/run.py in [0;36mget_context(cls, allow_offline, used_for_context_manager, **kwargs)
303 else:
304 module_logger.debug("Could not load the run context and allow_offline set to False")
--> 305 raise RunEnvironmentException(inner_exception=ex)
306
307 @classmethod
RunEnvironmentException: RunEnvironmentException:
Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run.
InnerException RunEnvironmentException:
Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run.
InnerException None
ErrorResponse
{
"error": {
"message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run."
}
}
ErrorResponse
{
"error": {
"message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run."
}
}
So it looks like it's looking for environment variables that aren't set. If I run it with Run.get_context() I get an error related to the _OfflineRun
not working (which makes sense since I need a Run
not an _OfflineRun
).
I am facing the same issue as above, mentioned by @RothNRK . Further I need parent pipeline run ID from the Run
.
If I run it with Run.get_context()
I am getting an error with the _OfflineRun
. Could you please point to an example of the same?
Thanks!
Just a friendly reminder that this ticket still exists.
@rastala , please check and create ICM for pipelines team if needed. Thanks!
I'm facing the exact same problem as the original poster of this issue. Still couldn't get it to work.
@RothNRK Thank you for pointing this out. I have created a work item to investigate the issue. We will update you shortly.
@RothNRK you will need to setup environment variables using parameters passed to your script in your code before you can call Run.get_context(). This is not a hack but the way this is designed to work.
@fahdkmsft It's unclear that this is the intended usage given that the only documentation related to these additional variables is under DatabricksStep -> python_script_name and makes no mention of setting them in order to get a Run through Run.get_context()
.
I apologize if I missed some docs that explains this but if I have can you please post a link to them?
I'd also like to point out that no one has answered my question about what the intended use of the AzureMLTokenAuthentication is? A DatabricksStep provides all of the information to instantiate this class but you can't authenticate a Workspace with it.
@RothNRK The documentation for Run class is common to all types of runs including pipeline runs, hyperdrive runs and automl runs. Please refer to: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.azuremltokenauthentication?view=azure-ml-py
For AzureMLTokenAuthentication, please refer to https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.azuremltokenauthentication?view=azure-ml-py
I am having the same issue as @RothNRK. I've created a pipeline with a DatabricksStep and I am unable to use Run.get_context() to even instantiate the run.
I'm using the following snippet of code similar RothNRK's and I can verify that each parameter is being set.
# This argument parsing is necessary for getting the Run Context
parameters = ['AZUREML_RUN_TOKEN', 'AZUREML_RUN_TOKEN_EXPIRY', 'AZUREML_RUN_ID', 'AZUREML_ARM_SUBSCRIPTION',
'AZUREML_ARM_RESOURCEGROUP', 'AZUREML_ARM_WORKSPACE_NAME', 'AZUREML_ARM_PROJECT_NAME',
'AZUREML_SERVICE_ENDPOINT' ]
for param in parameters:
temp_value = dbutils.widgets.get(f"--{param}")
print(f"Working on: {param} with vaule {temp_value}")
os.environ[param] = temp_value
print(f"Checking that it's set: {os.environ[param]}")
However, I'm noticing that the _load_scope method for the Run class seems to need the experiment id? Is this something that is required and should it be passed in as part of the parameters / widgets being provided by Azure ML?
@classmethod
def _load_scope(cls):
"""Load the current context from the environment.
:return: experiment, run_id, url
:rtype: azureml.core.Experiment, str, str
"""
from .authentication import AzureMLTokenAuthentication
from .experiment import Experiment
from .workspace import Workspace
try:
# Load authentication scope environment variables
subscription_id = os.environ['AZUREML_ARM_SUBSCRIPTION']
run_id = os.environ["AZUREML_RUN_ID"]
resource_group = os.environ["AZUREML_ARM_RESOURCEGROUP"]
workspace_name = os.environ["AZUREML_ARM_WORKSPACE_NAME"]
experiment_name = os.environ["AZUREML_ARM_PROJECT_NAME"]
experiment_id = os.environ.get("AZUREML_EXPERIMENT_ID")
workspace_id = os.environ.get("AZUREML_WORKSPACE_ID")
if experiment_id is None:
module_logger.warning("experiment_id cannot be found in env variable.")
# Initialize an AMLToken auth, authorized for the current run
token, token_expiry_time = AzureMLTokenAuthentication._get_initial_token_and_expiry()
Unfortunately, this is a poorly documented feature :-(
Is there any update on this issue @lostmygithubaccount
@shbijlan from Pipelines team is discussing with engineering and will update this thread soon
longer term we are looking to have consistent support for Spark jobs across Databricks, Synapse, and others, but this will not be available for a while
Sorry for delay response, The solution is to parse the script arguments and set corresponding environment variables to access the run context from within Databricks. Here is a code sample
`from azureml.core import Run import argparse import os
def populate_environ(): parser = argparse.ArgumentParser(description='Process arguments passed to script') parser.add_argument('--AZUREML_SCRIPT_DIRECTORY_NAME') parser.add_argument('--AZUREML_RUN_TOKEN') parser.add_argument('--AZUREML_RUN_TOKEN_EXPIRY') parser.add_argument('--AZUREML_RUN_ID') parser.add_argument('--AZUREML_ARM_SUBSCRIPTION') parser.add_argument('--AZUREML_ARM_RESOURCEGROUP') parser.add_argument('--AZUREML_ARM_WORKSPACE_NAME') parser.add_argument('--AZUREML_ARM_PROJECT_NAME') parser.add_argument('--AZUREML_SERVICE_ENDPOINT')
args = parser.parse_args()
os.environ['AZUREML_SCRIPT_DIRECTORY_NAME'] = args.AZUREML_SCRIPT_DIRECTORY_NAME
os.environ['AZUREML_RUN_TOKEN'] = args.AZUREML_RUN_TOKEN
os.environ['AZUREML_RUN_TOKEN_EXPIRY'] = args.AZUREML_RUN_TOKEN_EXPIRY
os.environ['AZUREML_RUN_ID'] = args.AZUREML_RUN_ID
os.environ['AZUREML_ARM_SUBSCRIPTION'] = args.AZUREML_ARM_SUBSCRIPTION
os.environ['AZUREML_ARM_RESOURCEGROUP'] = args.AZUREML_ARM_RESOURCEGROUP
os.environ['AZUREML_ARM_WORKSPACE_NAME'] = args.AZUREML_ARM_WORKSPACE_NAME
os.environ['AZUREML_ARM_PROJECT_NAME'] = args.AZUREML_ARM_PROJECT_NAME
os.environ['AZUREML_SERVICE_ENDPOINT'] = args.AZUREML_SERVICE_ENDPOINT
populate_environ() run = Run.get_context(allow_offline=False) print(run._run_dto["parent_run_id"])`
Hi to all,
I have just followed your suggestion using the populate_environ() block and now running into a interactive cluster
run = Run.get_context(allow_offline=False)
but databricks is not enjoying this part with error
tmp/tmpk0eno6d8.py in
52 populate_environ() 53 ---> 54 run = Run.get_context(allow_offline=False) 55 56 #print(run._run_dto["parent_run_id"]) /databricks/python/lib/python3.8/site-packages/azureml/core/run.py in get_context(cls, allow_offline, used_for_context_manager, **kwargs) 365 if used_for_context_manager: 366 return _SubmittedRun(experiment, run_id, **kwargs) --> 367 return _SubmittedRun._get_instance(experiment, run_id, **kwargs) 368 except RunEnvironmentException as ex: 369 module_logger.debug("Could not load run context %s, switching offline: %s", ex, allow_offline)
/databricks/python/lib/python3.8/site-packages/azureml/core/run.py in _get_instance(experiment, run_id, **kwargs) 2290 run = _SubmittedRun.__instances.get(arm_scope_with_run_id) 2291 if run is None: -> 2292 run = _SubmittedRun(experiment, run_id, **kwargs) 2293 _SubmittedRun.__instances[arm_scope_with_run_id] = run 2294 return run
/databricks/python/lib/python3.8/site-packages/azureml/core/run.py in init(self, *args, **kwargs) 2295 2296 def init(self, *args, **kwargs): -> 2297 super(_SubmittedRun, self).init(*args, **kwargs) 2298 self._input_datasets = None[0m 2299 self._output_datasets = None
/databricks/python/lib/python3.8/site-packages/azureml/core/run.py in init(self, experiment, run_id, outputs, **kwargs) 171 172 """ --> 173 super(Run, self).init(experiment, run_id, outputs=outputs, **kwargs) 174 self._parent_run = None 175
/databricks/python/lib/python3.8/site-packages/azureml/_run_impl/run_base.py in init(self, experiment, run_id, outputs, logs, _run_dto, _worker_pool, _user_agent, _ident, _batch_upload_metrics, py_wd, deny_list, flush_eager, redirect_output_stream, **kwargs) 81 raise 82 ---> 83 py_wd = get_py_wd() if py_wd is None else py_wd 84 85 self._client = RunHistoryFacade(self._experiment, self._run_id, RUN_ORIGIN, run_dto=_run_dto,
/databricks/python/lib/python3.8/site-packages/azureml/history/_tracking.py in get_py_wd() 302 303 def get_py_wd(): --> 304 return PythonWorkingDirectory.get() 305 306
/databricks/python/lib/python3.8/site-packages/azureml/history/_tracking.py in get(cls) 284 logger.debug("Adding SparkDFS") 285 from azureml._history.utils.filesystem import SparkDFS --> 286 spark_dfs = SparkDFS("spark_dfs", logger) 287 fs_list.append(spark_dfs) 288 logger.debug("Added SparkDFS")
/databricks/python/lib/python3.8/site-packages/azureml/_history/utils/filesystem.py in init(self, ident, logger) 112 113 self.spark = SparkSession.builder.getOrCreate() --> 114 config = self.spark._sc._jsc.hadoopConfiguration() 115 116 dfs_cwd = self.spark._sc._gateway.jvm.org.apache.hadoop.fs.Path(".")
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args) 1302 1303 answer = self.gateway_client.send_command(command) -> 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) 108 def deco(*a, **kw): 109 try: --> 110 return f(*a, **kw) 111 except py4j.protocol.Py4JJavaError as e: 112 converted = convert_exception(e.java_exception)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 328 format(target_id, ".", name), value) 329 else: --> 330 raise Py4JError( 331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n". 332 format(target_id, ".", name, value))
Py4JError: An error occurred while calling o238.hadoopConfiguration. Trace: py4j.security.Py4JSecurityException: Method public org.apache.hadoop.conf.Configuration org.apache.spark.api.java.JavaSparkContext.hadoopConfiguration() is not whitelisted on class class org.apache.spark.api.java.JavaSparkContext at py4j.security.WhitelistingPy4JSecurityManager.checkCall(WhitelistingPy4JSecurityManager.java:473) at py4j.Gateway.invoke(Gateway.java:294) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748)
Any Idea ?
I am working with a local python scrupt DatabricksStep.
Hi all, is there any update on this issue? I followed the suggestion but it does not work from Databricks job cluster launched by a DatabrickStep and running a local python script.
@kamakay or @RothNRK, did you guys find any option to use the Run class from the python script in Databricks?
Thanks in advanced. BR.