spark-expectations
spark-expectations copied to clipboard
A Python Library to support running data quality rules while the spark job is running⚡
**Is your feature request related to a problem? Please describe.** There is an opportunity to improve the performance so the pipelines run times don't get impacted much. The "write error...
**Describe the solution you'd like** It would be nice if it would be possible to have the meta_dq_run_id returned/obtained when executing the expectations. That way we can easily query the...
Describe the bug The values that are observed in the output_count are sometimes more than the value of input_count. This case was happened only few times for the many times...
**Is your feature request related to a problem? Please describe.** During our testing/adoption, 1. we feel like a framework level flag to determine how the stats should be stored(relational vs...
**Describe the bug** Trying to use the email notifications module, it seems that there is no parameter foreseen to pass the credentials to login to the smtp server. Only the...
**Problem statement** As a user implementing data quality in DataBricks, besides implementing DQ validations, I want to perform data profiling which is directly supported in Databricks but also save the...
**Describe the solution you'd like** Rather than having a `Dict[str, Union[str, bool, int]]`, as shown below. ``` se_conf = { "se_notifications_enable_email": False, "se_notifications_email_smtp_host": "mailhost.example.com", "se_notifications_email_smtp_port": 25, "se_notifications_email_from": "[email protected]", "se_notifications_email_subject": "spark...
**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Run ```python from pyspark import SparkFiles from pyspark.sql import * from spark_expectations.core.expectations import SparkExpectations, WrappedDataFrameWriter...
**Is your feature request related to a problem? Please describe.** As of now `spark-expectations` support `row_dq`, `agg_dq` and `query_dq` rule types. If users want to add their own rule types...
**Is your feature request related to a problem? Please describe.** Support `spark-expectations` for the pipelines written in Scala/Java Spark SDK **Describe the solution you'd like** TBD **Describe alternatives you've considered**...