great_expectations
great_expectations copied to clipboard
expect_column_values_to_of_type does not work
Opening an issue similar to this https://github.com/great-expectations/great_expectations/issues/5565
I'm also running into this problem while using a Snowflake datasource to check the type of the columns.
Here are the specs,
- python==3.10.13
- great_expectatinos==0.18.10
- SQLAlchemy==1.4.51
- snowflake-connector-python==2.8.3
- snowflake-sqlalchemy==1.4.7
I was debugging through this and found that the metrics object only has the key name
in the dict
stack trace
Originally posted by @sfan0704 in https://github.com/great-expectations/great_expectations/issues/5565#issuecomment-1998654976
Hey @sfan0704 -- I'm so far unable to reproduce this. Are you able to share your configuration for this expectation as well?
Hey @austiezr thanks for responding to this. My expectation suite looks like this. Not sure what else I can provide to make this clearer.
{
"expectation_type": "expect_column_values_to_be_of_type",
"kwargs": {
"column": "reference_currency_price",
"type_": "FLOAT"
},
"meta": {}
},
How did you initially create this datasource? Is this error arising after running a checkpoint?
I create the datasource like this,
self.data_context.sources.add_snowflake(name="source_name", connection_string="STRING")
And correct, the exception was thrown after we run the checkpoint.
Hi Team,
Any update on this issue. I had worked with this expectation and it was working on till March 14, 2024. Today when I am running the same validation with same arguments, its giving me error as below:
Calculating Metrics: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.16it/s] 'type'
Can you share your python dependency versions?
Can you share your python dependency versions?
I am using python==3.9.6 great_expectatinos==0.18.7 SQLAlchemy==1.4.51 snowflake-connector-python==3.6.0 snowflake-sqlalchemy==1.5.1
Hi Team,
Any update on this issue. Any eta for the issue.?
Hey @NehaNRane @sfan0704, Austin handed this issue over to me - I'm going to investigate and try to reproduce the error you're getting.
@sfan0704 @NehaNRane Ok, I've tested in two different virtual environments matching your dependencies and have not been able to replicate the error - the expect_column_values_to_be_of_type
Expectation runs without issue on a Snowflake Table Asset.
As a next troubleshooting step, could you please install great-expectations[snowflake]
(using the version currently giving you the error) in a fresh virtual environment and retry your code?
If you still get the same error after a fresh install, we'll take a closer look at your code and Snowflake data next.
@rachhouse, I tried again by creating fresh virtual environment but I am getting the same error. python==3.9.6 SQLAlchemy==1.4.52 great-expectations==0.18.12 snowflake-connector-python==3.8.1 snowflake-sqlalchemy==1.5.1
Error:
Calculating Metrics: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.27it/s]
Traceback (most recent call last):
File "c:\Users\10700792\OneDrive - LTIMindtree\Backup - 01-03-2023\Backup on 1st Mar\Documents\Extras\poc\all_expectation_testing.py", line 154, in
actual_column_type = [
File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\core\expect_column_values_to_be_of_type.py", line 497, in
Hi @NehaNRane, thanks for retrying with a fresh install and including the stacktrace. Can you please also share your GX workflow code from all_expectation_testing.py
that generates this error?
I've tested this Expectation (expect_column_values_to_of_type
) against a collection differently cased Snowflake table columns, but still have not reproduced an error.
At this point, I am suspicious that this behavior might be a Windows issue. I see you are using Windows from your stacktrace, @NehaNRane, and when this same behavior was reported in #5565, it also occurred for a Windows user.
@sfan0704, are you running your code on Windows or WSL?
I'm running on a Mac
Hi @rachhouse
Below code is in all_expectation_testing.py
. Its a simple code.
import great_expectations as gx
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.core.expectation_configuration import ExpectationConfiguration
import yaml
from configs.datasource_config import snowflake_datasource
context = gx.get_context()
snowflake_datasource = {
"name": "my_snowflake_datasource",
"class_name": "Datasource",
"execution_engine": {
"class_name": "SqlAlchemyExecutionEngine",
"connection_string": connection_string
"create_temp_table": False,
},
"data_connectors": {
"default_runtime_data_connector_name": {
"class_name": "RuntimeDataConnector",
"batch_identifiers": ["default_identifier_name"],
},
},
}
db_datasource = context.add_or_update_datasource(**snowflake_datasource)
table_name = "PAYMENT"
schema_name = "STRIPE"
db_name = "RAW_DATA"
batch_request = RuntimeBatchRequest(
datasource_name="my_snowflake_datasource",
data_connector_name="default_runtime_data_connector_name",
data_asset_name=table_name,
runtime_parameters={
"query": f'SELECT * FROM "{db_name}"."{schema_name}"."{table_name}"'
},
batch_identifiers={"default_identifier_name": "default_identifier"}
)
validator = context.get_validator(batch_request=batch_request)
validation_result = validator.expect_column_values_to_be_of_type(column="status", type_= "VARCHAR")
print(validation_result)
@NehaNRane, looks like the issue is that you're using outdated code - the use of RuntimeBatchRequest
is deprecated. Can you try updating your code to use Fluent Data Sources? Here is a simple example of how you can connect to a Snowflake table and run the expect_column_values_to_of_type
Expectation:
import great_expectations as gx
DATA_SOURCE_NAME = "<your data source name>"
DATA_ASSET_NAME = "<your data asset name>"
EXPECTATION_SUITE_NAME = "<your expectation suite name>"
CHECKPOINT_NAME = "<your checkpoint name>"
context = gx.get_context()
data_source = context.sources.add_snowflake(
name=DATA_SOURCE_NAME,
connection_string="<snowflake connection string>"
)
data_asset = data_source.add_table_asset(
name=DATA_ASSET_NAME,
table_name="<snowflake table name>"
)
batch_request = data_asset.build_batch_request()
context.add_or_update_expectation_suite(expectation_suite_name=EXPECTATION_SUITE_NAME)
validator = context.get_validator(
batch_request=batch_request,
expectation_suite_name=EXPECTATION_SUITE_NAME,
)
validator.expect_column_values_to_be_of_type("<snowflake table column name>", type_="FLOAT")
validator.expect_column_values_to_be_of_type("<snowflake table column name>", type_="VARCHAR")
validator.save_expectation_suite(discard_failed_expectations=False)
checkpoint = context.add_or_update_checkpoint(
name=CHECKPOINT_NAME,
validator=validator,
)
checkpoint_result = checkpoint.run()
@sfan0704 Thanks for confirming your environment, can share your full GX workflow code that generates the error?
Hi @rachhouse,
I observed that this issue occurs only when data_asset is created by query asset. We have a table_name in mixed case. To handle this, we are using query asset. In our use case, table_name and column_name can be in any case. So, how to go about it in such scenario.?
Thanks @NehaNRane, those details are very helpful - I'm now able to consistently generate an error when using a Snowflake table with a mixed case name, a QueryAsset
, and expect_column_values_to_be_of_type
. I'll dig into this with Engineering next week.
Sorry for the delayed response, we're also doing exactly what @rachhouse is doing. Looking forward to the potential solution here. Thanks.
Hi @NehaNRane @sfan0704, Engineering and I dug further into this error, and the problem is caused by Snowflake tables that have a MixedCaseName
name identifier that needs to be quoted (for example, DATABASE.SCHEMA."MyTable"
). Fixing the underlying issue is ticketed and on our backlog, but we don't have a timeline yet for when we'll be able to fix it.
The current workaround is to either:
a) create a view of your MixedCaseTableName
table that uses an UPPERCASE_NAME
or
b) rename your MixedCaseTableName
table to use an UPPERCASE_NAME
and use a GX Table Asset with your renamed table or created view. From there, you should be able to run expect_column_values_to_be_of_type
on your asset and have it behave as expected.
In testing, I created a view, UCCTABLENAME_VIEW
from a table with a quoted MixedCaseName
:
create view UCCTABLENAME_VIEW as select * from "UccTableName";
used the view to create a Table Asset:
data_asset = data_source.add_table_asset(
name="snowflake-table",
table_name="UCCTABLENAME_VIEW"
)
and was able to run expect_column_values_to_be_of_type
using the subsequent validator. In testing, MixedCaseColumnNames
worked fine for the Expectation - as long as the table/view name was uppercase.
validator.expect_column_values_to_be_of_type("ColumnOne", type_="FLOAT")
validator.expect_column_values_to_be_of_type("ColumnTwo", type_="VARCHAR")
Hello @sfan0704. With the launch of Great Expectations Core (GX 1.0), we are closing old issues posted regarding previous versions. Moving forward, we will focus our resources on supporting and improving GX Core (version 1.0 and beyond). If you find that an issue you previously reported still exists in GX Core, we encourage you to resubmit it against the new version. With more resources dedicated to community support, we aim to tackle new issues swiftly. For specific details on what is GX-supported vs community-supported, you can reference our integration and support policy.
To get started on your transition to GX Core, check out the GX Core quickstart (click “Full example code” tab to see a code example).
You can also join our upcoming community meeting on August 28th at 9am PT (noon ET / 4pm UTC) for a comprehensive rundown of everything GX Core, plus Q&A as time permits. Go to https://greatexpectations.io/meetup and click “follow calendar” to follow the GX community calendar.
Thank you for being part of the GX community and thank you for submitting this issue. We're excited about this new chapter and look forward to your feedback on GX Core. 🤗