Zach comments

Results 52 comments of


                                            Zach

Feature/377 constraints and expectations

> * create a docu site for data quality? -> done > * check integration of original spark metrics. -> looks good @pgruetter : PR ok for you?

Feature/377 constraints and expectations

Yes, docu site is already existing, but description of expectations must be completed: http://smartdatalake.ch/docs/reference/dataQuality

Create an Action to apply a ML model

I was able to run a model prepared by @raproth with SmartDataLake: ``` transformer.pythonCode = """ |import mlflow |model_uri = "file://c:/Entwicklung/test_mlmodel" |predict_udf = mlflow.pyfunc.spark_udf(session, model_uri) |from pyspark.sql.functions import * |inputDf.printSchema...

Handle Code Smells

@pgruetter : can we exclude unit tests from the code to be inspected? see "sonar.test.exclusions" on https://docs.sonarqube.org/latest/project-administration/narrowing-the-focus/.

Support collecting custom metrics

With this mechanism we should also collect metric numRecords for KafkaTopicDataObject and JdbcTableDataObject.

Advanced Validation

Interesting product: https://owl-analytics.com/

Advanced Validation

It's now possible to define Constraints on DataObjects that are validated for each row when writing. See also http://smartdatalake.ch/docs/reference/dataQuality#constraints

Configure spark options per Action and Connection

The solution would be to clone the Spark session (session.cloneSession()) in order to have isolated spark configuration for an action: https://stackoverflow.com/questions/47723761/how-many-sparksessions-can-a-single-application-have An Action with specific spark-options should clone the spark...

Schema Viewer shows wrong information

Point 3 & 4 are solved. But 1 and 2 remains, see WebserviceFileDataObject.

Schema Evolution: Error if numeric type in target has higher precision than in source

Hi @MarkusRothSBB, glad you found a workaround! I have the following improvement to the workaround: it should be .toInt instead of .toDouble at the line end: `case (d: DecimalType, _:...