great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

s3 validation html results not getting updated

Open emptyr1 opened this issue 2 years ago • 3 comments

Describe the bug failing to see updated site html hosted on s3 with any validations(it shows no validations at all)

To Reproduce Steps to reproduce the behavior: See great_expectation.yml file below

Expected behavior Html hosted on s3 to be updated

Environment (please complete the following information):

  • osx
  • great_expectations==0.15.8
  • airflow 2.2.2

Additional context great_expectation.yml

stores:

  # local stores
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: expectations/

  validations_store:
    class_name: ValidationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/validations/

  evaluation_parameter_store:
    class_name: EvaluationParameterStore

  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: checkpoints/

  # prod stores
  expectations_s3_prod_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: my-data-expectations
      prefix: expectations

  validations_s3_prod_store:
    class_name: ValidationsStore
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: my-data-expectations
      prefix: validations

  evaluation_s3_prod_parameter_store:
    class_name: EvaluationParameterStore
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: my-data-expectations
      prefix: evaluation

  checkpoint_s3_prod_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: my-data-expectations
      prefix: checkpoints

expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    # set to false to hide how-to buttons in Data Docs
    show_how_to_buttons: true
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder

  #prod
  s3_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: my-data-expectations
      prefix: site
    site_index_builder:
      class_name: DefaultSiteIndexBuilder
      show_cta_footer: true

and my checkpoint default_checkpoint_prod.yml

name: default_checkpoint_prod
config_version: 1.0
template_name:
module_name: great_expectations.checkpoint
class_name: Checkpoint
run_name_template: '%Y-%m-%d %H:%M:%S - prod'
expectation_suite_name:
batch_request:
action_list:
  - name: store_validation_result
    action:
      class_name: StoreValidationResultAction
      target_store_name: validations_s3_prod_store
  - name: store_evaluation_params
    action:
      class_name: StoreEvaluationParametersAction
      target_store_name: evaluation_s3_prod_parameter_store
  - name: update_data_docs
    action:
      class_name: UpdateDataDocsAction
      site_names: [s3_site]
evaluation_parameters: {}
validations:
profilers: []

so when I go to my index html on s3, I see no output validations at all Screen Shot 2022-06-07 at 12 35 05 PM

Additional commands tried- I also ran: great_expectations docs build --site-name s3_site, it gives me errors but finishes fine:

  File "/Users/mudit.uppal/code/gluflow-dags/virtual-environment/lib/python3.8/site-packages/great_expectations/render/renderer/renderer.py", line 13, in inner_func
    return renderer_fn(*args, **kwargs)
  File "/Users/mudit.uppal/code/gluflow-dags/virtual-environment/lib/python3.8/site-packages/great_expectations/expectations/expectation.py", line 571, in _diagnostic_unexpected_table_renderer
    value = unexpected_count_dict.get("value")
AttributeError: 'str' object has no attribute 'get'
.....
...
Done building Data Docs

Now I do see validations in the output html and s3. HOWEVER, its showing all the validations from the "local" site ( basically when I run something locally ).. all validations I ran in the "local" checkpoint not in the "prod" if you see the checkpoint above. But when I run the prod checkpoint default_checkpoint_prod.yml (below) the output html on s3 does not get updated. Hope its making sense

name: default_checkpoint_prod
config_version: 1.0
template_name:
module_name: great_expectations.checkpoint
class_name: Checkpoint
run_name_template: '%Y-%m-%d %H:%M:%S - prod'
expectation_suite_name:
batch_request:
action_list:
  - name: store_validation_result
    action:
      class_name: StoreValidationResultAction
      target_store_name: validations_s3_prod_store
  - name: store_evaluation_params
    action:
      class_name: StoreEvaluationParametersAction
      target_store_name: evaluation_s3_prod_parameter_store
  - name: update_data_docs
    action:
      class_name: UpdateDataDocsAction
      site_names: [s3_site]
evaluation_parameters: {}
validations:
profilers: []

default_checkpoint_local.yml

name: default_checkpoint_local
config_version: 1.0
template_name:
module_name: great_expectations.checkpoint
class_name: Checkpoint
run_name_template: '%Y-%m-%d %H:%M:%S - local'
expectation_suite_name:
batch_request:
action_list:
  - name: store_validation_result
    action:
      class_name: StoreValidationResultAction
  - name: store_evaluation_params
    action:
      class_name: StoreEvaluationParametersAction
  - name: update_data_docs
    action:
      class_name: UpdateDataDocsAction
      site_names: [local_site]
evaluation_parameters: {}
validations:
profilers: []

Finally, the way I'm running the checkpoint is via the notebook:

validation = {
        "batch_request": datasource("2022-01-01"), <-- returns a run_time_batch_request querying snowflake table
        "expectation_suite_name": expectation_suite_name,
    }
checkpoint_result = context.run_checkpoint(
        checkpoint_name=f"default_checkpoint_prod",
        validations=[validation],
    )

emptyr1 avatar Jun 14 '22 20:06 emptyr1

Hey @emptyr1 ! Thanks for surfacing this; we'll review internally and be in touch.

austiezr avatar Jun 15 '22 15:06 austiezr

Wondering if there's any update here?

emptyr1 avatar Jul 18 '22 01:07 emptyr1

Hi @emptyr1, can you try removing the line: run_name_template: '%Y-%m-%d %H:%M:%S - prod' from your checkpoint file? It solved the problem for me.

robbe1999 avatar Aug 18 '22 13:08 robbe1999