Cache BigQuery: Invalid value for type: DECIMAL(38, 9) is not a valid value
I am running the latest version (0.22) but I am getting errors when trying to use the BigQuery cache. On source-facebook-marketing, source-quickbooks and source-xero I get the following trace
Traceback (most recent call last): File "/Users/andrei/Work/cedara/pyairbyte/export_quickbooks.py", line 59, in <module> result = source.read(cache=cache) File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/sources/base.py", line 659, in read result = self._read_to_cache( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/sources/base.py", line 743, in _read_to_cache cache._write_airbyte_message_stream( # noqa: SLF001 # Non-public API File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/caches/base.py", line 325, in _write_airbyte_message_stream cache_processor.process_airbyte_messages( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 315, in process_airbyte_messages self._write_all_stream_data( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 329, in _write_all_stream_data self.write_stream_data( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 748, in write_stream_data temp_table_name = self._write_files_to_new_table( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/_processors/sql/bigquery.py", line 178, in _write_files_to_new_table load_job = client.load_table_from_file( # Make an API request File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/google/cloud/bigquery/client.py", line 2601, in load_table_from_file raise exceptions.from_http_response(exc.response) google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/REDACTED/jobs?uploadType=resumable: Invalid value for type: DECIMAL(38, 9) is not a valid value
It's working correctly when I use source-faker (with both users and purchases streams).
Any idea what's going on?
@aaronsteers I've done a bit more testing and identified that this was broken in 0.17.9, it works fine with 0.17.8 and lower. Hopefully this intel gives some clues to what the problem could be.
@andreibaragan - Thanks for raising! We'll take a look. I'm also adding the accepting_pull_requests label in case anyone else has time to take a look at this.
@aaronsteers asked me to research this. Here are my findings:
After analyzing the changes between versions 0.17.8 and 0.17.9, I've identified the root cause of the DECIMAL(38, 9) type issue:
-
The breaking change was introduced in the BigQuery type converter (bigquery.py) as part of the SQLAlchemy 2.0 upgrade in PR #396:
Before (v0.17.8):
def to_sql_type(self, sql_type): return sql_type.__class__.__name__ # Returns string type namesAfter (v0.17.9):
def to_sql_type(self, sql_type): return sql_type # Returns raw SQLAlchemy type objects -
This change affects how decimal types are handled:
- In v0.17.8, DECIMAL types were converted to string names that BigQuery could understand
- In v0.17.9, the raw SQLAlchemy DECIMAL type object is passed through without conversion
- BigQuery expects specific type names for decimal/numeric types, which is why we're seeing the 'Invalid value for type: DECIMAL(38, 9)' error
The issue occurs because BigQuery doesn't directly support SQLAlchemy DECIMAL types in this format - they need to be properly mapped to BigQuery-compatible NUMERIC types.
A fix would involve updating the BigQueryTypeConverter to properly handle DECIMAL types by converting them to BigQuery's NUMERIC type with appropriate precision and scale parameters.
@andreibaragan - I'm asking Devin.ai to follow-up with a PR. While I don't have a lot of time to pick up myself right now, I think the AI may do a decent job, since this seems like a very discreet/testable workflow.
I have started an attempt to fix here:
- https://github.com/airbytehq/PyAirbyte/pull/582
Hi @aaronsteers, I'm having the same issue. Will this problem be solved soon?
Here are my env versions:
.venv➜ Airbyte pip3 freeze
airbyte==0.23.0
airbyte-api==0.52.2
airbyte-cdk==6.28.0
airbyte_protocol_models_dataclasses==0.14.2
airbyte_protocol_models_pdv2==0.13.1
Hi @julioyildo, @aaronsteers,
I find a quick fix (but i don't know if this will be a permanent one)
In this file, you can add...
# .venv/lib/python3.10/site-packages/airbyte/_processors/sql/bigquery.py
# Comprehensive handling for numeric types
if isinstance(sql_type, (sqlalchemy.types.DECIMAL, sqlalchemy.types.NUMERIC)):
# Force to BigQuery's FLOAT64
return "FLOAT64"
... so that it forces BigQuery to create a FLOAT64 for DECIMAL and NUMERIC types
You have this at the end :
class BigQueryTypeConverter(SQLTypeConverter):
"""A class to convert types for BigQuery with enhanced type handling."""
@classmethod
def get_string_type(cls) -> sqlalchemy.types.TypeEngine:
"""Return the string type for BigQuery."""
return cast("sqlalchemy.types.TypeEngine", "String") # BigQuery uses STRING for all strings
@overrides
def to_sql_type(
self,
json_schema_property_def: dict[str, str | dict | list],
) -> sqlalchemy.types.TypeEngine:
"""Convert a value to a SQL type with comprehensive type handling."""
sql_type = super().to_sql_type(json_schema_property_def)
# Handle specific type conversions for BigQuery
if isinstance(sql_type, sqlalchemy.types.VARCHAR):
return self.get_string_type()
if isinstance(sql_type, sqlalchemy.types.BIGINT):
return sqlalchemy_types.Integer() # All integers are 64-bit in BigQuery
# Comprehensive handling for numeric types
if isinstance(sql_type, (sqlalchemy.types.DECIMAL, sqlalchemy.types.NUMERIC)):
# Force to BigQuery's FLOAT64 to avoid precision errors
return "FLOAT64"
return sql_type
It fixed the error for me.
bump, also having this issue with these packages
"airbyte>=0.24.2",
"airbyte-source-hubspot>=4.6.0rc1",
"google-cloud-bigquery-storage>=2.30.0",
@MartinPrejean your fix worked, thanks!
@MartinPrejean fix works. Thanks.
I am also experiencing this.
~When I applied the fix from above, I got the following error when I attempted to load:~
DatabaseError(google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/stardust-prod-5a598/queries?prettyPrint=false: Value of type FLOAT64
cannot be assigned to ae_total_app_sessions, which has type NUMERIC at [28:31]
~which seems pretty circular to the original error.~
edit: This was due to the table already having been created initially, expecting the NUMERIC type. I had to delete the empty table and re-run, and this solution then worked. 🤦🏼