soda-core
soda-core copied to clipboard
Issue with failed rows for checks like duplicates
This is causing the sync with Soda Cloud.
SODA-559
I was able to replicate this issue using the reference data check:
checks for retail_customers:
- values in country_code must exist in ref_countries iso:
name: Ensure valid country codes
Gives this output:
(soda-cl-v3.0.0b15) $soda scan -d aws_postgres_retail uploaderror.yml
Soda Core 3.0.0b15
Empty file upload detected, not sending Content-Length header
No fileId received in response: {'code': 'invalid_empty_upload', 'message': 'File uploads may not be empty'}
Soda cloud error: Could not upload sample failed_rows
| 'fileId'
Scan summary:
1/1 check PASSED:
retail_customers in aws_postgres_retail
values in country_code must exist in ref_countries iso [PASSED]
1 errors.
Oops! 1 error. 0 failures. 0 warnings. 1 pass.
ERRORS:
Soda cloud error: Could not upload sample failed_rows
| 'fileId'
Sending results to Soda Cloud
Another example:
checks for dim_gift:
- row_count = 551573989
- duplicate_count(transaction_id) = 0
Results in this:
soda scan -V -d dwh_2020 -c C:\Users\dasher\.soda\configuration.yml C:\Users\dasher\soda_bigquery\checks.yml
Soda Core 3.0.0b15
Reading configuration file "C:\Users\dasher\.soda\configuration.yml"
Reading SodaCL file "C:\Users\dasher\soda_bigquery\checks.yml"
Scan execution starts
C:\Users\dasher\.venv\lib\site-packages\google\cloud\bigquery\client.py:535: UserWarning: Cannot create BigQuery Storage client, the dependency google-cloud-bigquery-storage is not installed.
warnings.warn(
Query dwh_2020.dim_gift.aggregation[0]:
SELECT
COUNT(*)
FROM dim_gift
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Making request: POST https://oauth2.googleapis.com/token
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Query dwh_2020.dim_gift.transaction_id.duplicate_count:
WITH frequencies AS (
SELECT transaction_id, COUNT(*) AS frequency
FROM dim_gift
WHERE transaction_id IS NOT NULL
GROUP BY transaction_id)
SELECT *
FROM frequencies
WHERE frequency > 1;
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Empty file upload detected, not sending Content-Length header
No fileId received in response: {'code': 'invalid_empty_upload', 'message': 'File uploads may not be empty'}
Soda cloud error: Could not upload sample dim_gift_transaction_id_failed_rows
| 'fileId'
| Stacktrace:
| Traceback (most recent call last):
| File "C:\Users\dasher\.venv\lib\site-packages\soda\soda_cloud\soda_cloud.py", line 113, in upload_sample
| file_id = self._upload_sample_http(scan_definition_name, file_path, temp_file, temp_file_size_in_bytes)
| File "C:\Users\dasher\.venv\lib\site-packages\soda\soda_cloud\soda_cloud.py", line 143, in _upload_sample_http
| return upload_response_json["fileId"]
| KeyError: 'fileId'
Scan summary:
2/2 queries OK
dwh_2020.dim_gift.aggregation[0] [OK] 0:00:01.630912
dwh_2020.dim_gift.transaction_id.duplicate_count [OK] 0:00:01.610153
1/2 checks PASSED:
dim_gift in dwh_2020
duplicate_count(transaction_id) = 0 [PASSED]
check_value: 0
failed_rows_sample_ref: soda_cloud 2x(0/0)
1/2 checks FAILED:
dim_gift in dwh_2020
row_count = 551573989 [FAILED]
check_value: 438952718
1 errors.
Oops! 1 error. 1 failures. 0 warnings. 1 pass.
ERRORS:
Soda cloud error: Could not upload sample dim_gift_transaction_id_failed_rows
| 'fileId'
| Stacktrace:
| Traceback (most recent call last):
| File "C:\Users\dasher\.venv\lib\site-packages\soda\soda_cloud\soda_cloud.py", line 113, in upload_sample
| file_id = self._upload_sample_http(scan_definition_name, file_path, temp_file, temp_file_size_in_bytes)
| File "C:\Users\dasher\.venv\lib\site-packages\soda\soda_cloud\soda_cloud.py", line 143, in _upload_sample_http
| return upload_response_json["fileId"]
| KeyError: 'fileId'
Sending results to Soda Cloud
Error while executing Soda Cloud command response code: 400
{
"code": "invalid_request",
"message": "Failed request validation on the following properties:\nchecks[1].diagnostics.failedRowsFile.reference: may not be null\nchecks[1].diagnostics.failedRowsFile.reference: may not be null"
}
Open Telemetry: Skipping non-soda span 'BigQuery.job.begin'.
Open Telemetry: Skipping non-soda span 'BigQuery.getQueryResults'.
Open Telemetry: Skipping non-soda span 'BigQuery.job.begin'.
Open Telemetry: Skipping non-soda span 'BigQuery.getQueryResults'.
Already have a PR for
No fileId received in response: {'code': 'invalid_empty_upload', 'message': 'File uploads may not be empty'}
I'm not sure the fix also fixes:
Error while executing Soda Cloud command response code: 400
{
"code": "invalid_request",
"message": "Failed request validation on the following properties:\nchecks[1].diagnostics.failedRowsFile.reference: may not be null\nchecks[1].diagnostics.failedRowsFile.reference: may not be null"
}
I'll check that later
@albinkjellin Are these also things I should look into?
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Open Telemetry: Skipping non-soda span 'BigQuery.job.begin'.
Open Telemetry: Skipping non-soda span 'BigQuery.getQueryResults'.
Open Telemetry: Skipping non-soda span 'BigQuery.job.begin'.
Open Telemetry: Skipping non-soda span 'BigQuery.getQueryResults'.
Note to self: use tests/integration/test_samples_integration.py
to try and reproduce the
Error while executing Soda Cloud command response code: 400
{
"code": "invalid_request",
"message": "Failed request validation on the following properties:\nchecks[1].diagnostics.failedRowsFile.reference: may not be null\nchecks[1].diagnostics.failedRowsFile.reference: may not be null"
}
Thanks for the quick response on this! The:
Invalid type NoneType for attribute value. Expected one of ['bool', 'str', 'bytes', 'int', 'float'] or a sequence of those types
Is not as urgent.
@albinkjellin I couldn't reproduce Invalid type NoneType for attribute value
, Is there a way you can provide the configuration.yml?
@albinkjellin can you please test/check with 3.0.12 and re-open if this is still not working?