spark-expectations
spark-expectations copied to clipboard
[FEATURE] Use SparkConf or RuntimeConf for the se_conf.
Describe the solution you'd like
Rather than having a Dict[str, Union[str, bool, int]]
, as shown below.
se_conf = {
"se_notifications_enable_email": False,
"se_notifications_email_smtp_host": "mailhost.example.com",
"se_notifications_email_smtp_port": 25,
"se_notifications_email_from": "[email protected]",
"se_notifications_email_subject": "spark expectations - data quality - notifications",
"se_notifications_on_fail": True,
"se_notifications_on_error_drop_exceeds_threshold_breach": True,
"se_notifications_on_error_drop_threshold": 15,
}
What if we provided the ability to configure the expectations directly with the SparkSession configuration?
The following would be the requirement from the end user since they need to provide their own SparkSession in the first place.
spark = SparkSession.getActiveSession()
spark.conf.set("se.notifications.email.enabled", "false")
spark.conf.set("se.notifications.email.smtp.host", "mailhost.example.com")
spark.conf.set("se.notifications.email.smtp.port", "25")
spark.conf.set("se.notifications.email_from", "[email protected]")
spark.conf.set("se.notifications.email_subject", "spark expectations - data quality - notifications")
spark.conf.set("se.notifications.on_fail", "true")
spark.conf.set("se.notifications.on_error_drop_exceeds_threshold_breach", "true")
spark.conf.set("se.notifications.on_error_drop_threshold", "15")
Describe alternatives you've considered The alternative is to construct a configuration dictionary (that is already completed).
Additional context
By using spark.newSession
managing the configuration bound to a given instance of the SparkSession becomes easier.
Am I willing to work on this. Yes.