Alexandre Gattiker comments

Results 8 comments of


                                            Alexandre Gattiker

IP address requirements clarification

The information from the different sources is confusing. https://docs.microsoft.com/en-gb/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject#requirements states that with /26 subnets, you can have a maximum of 64 nodes active at a time. This is already wrong...

Extract and restore state

The idea here is to serialise the state. We have items processed by N workers from a work queue, they pick up one work item, attempt I/O, and then on...

Improve local developer experience when submitting pipelines

Here's the bash command line I use ``` export BUILD_BUILDID=$(uuidgen); python ml_service/pipelines/build_train_pipeline.py && python ml_service/pipelines/run_train_pipeline.py ``` All in one line. Not user friendly, but it does 2-4 in this list...

Using AutoMLStep requires azureml.train.automl.runtime

Perhaps adding psutil and/or gcc conda package to your env should work. ``` name: mlopspython_ci dependencies: - gcc ```

Databricks function adx

There are existing, well-tested scripts in the solution for generating SPs and storing them in AKV, consider reusing them

eventhubs-databricks-eventhubs

You need to put the body in a column called 'body' https://github.com/Azure/azure-event-hubs-spark/blob/master/docs/structured-streaming-eventhubs-integration.md#creating-an-eventhubs-sink-for-streaming-queries To generate JSON from a struct: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#to_json-org.apache.spark.sql.Column-

Databricks function adx

Please reopen after addressing comments.

Update to Ingest Data into Delta Lake Using Azure Synapse

Please reopen after addressing comments.