PyAirbyte icon indicating copy to clipboard operation
PyAirbyte copied to clipboard

Cache BigQuery: Invalid value for type: DECIMAL(38, 9) is not a valid value

Open andreibaragan opened this issue 11 months ago • 11 comments

I am running the latest version (0.22) but I am getting errors when trying to use the BigQuery cache. On source-facebook-marketing, source-quickbooks and source-xero I get the following trace

Traceback (most recent call last): File "/Users/andrei/Work/cedara/pyairbyte/export_quickbooks.py", line 59, in <module> result = source.read(cache=cache) File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/sources/base.py", line 659, in read result = self._read_to_cache( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/sources/base.py", line 743, in _read_to_cache cache._write_airbyte_message_stream( # noqa: SLF001 # Non-public API File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/caches/base.py", line 325, in _write_airbyte_message_stream cache_processor.process_airbyte_messages( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 315, in process_airbyte_messages self._write_all_stream_data( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 329, in _write_all_stream_data self.write_stream_data( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/shared/sql_processor.py", line 748, in write_stream_data temp_table_name = self._write_files_to_new_table( File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/airbyte/_processors/sql/bigquery.py", line 178, in _write_files_to_new_table load_job = client.load_table_from_file( # Make an API request File "/Users/andrei/Work/cedara/pyairbyte/python310/lib/python3.10/site-packages/google/cloud/bigquery/client.py", line 2601, in load_table_from_file raise exceptions.from_http_response(exc.response) google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/REDACTED/jobs?uploadType=resumable: Invalid value for type: DECIMAL(38, 9) is not a valid value

It's working correctly when I use source-faker (with both users and purchases streams).

Any idea what's going on?

andreibaragan avatar Jan 14 '25 14:01 andreibaragan

@aaronsteers I've done a bit more testing and identified that this was broken in 0.17.9, it works fine with 0.17.8 and lower. Hopefully this intel gives some clues to what the problem could be.

andreibaragan avatar Jan 16 '25 17:01 andreibaragan

@andreibaragan - Thanks for raising! We'll take a look. I'm also adding the accepting_pull_requests label in case anyone else has time to take a look at this.

aaronsteers avatar Jan 24 '25 20:01 aaronsteers

@aaronsteers asked me to research this. Here are my findings:

After analyzing the changes between versions 0.17.8 and 0.17.9, I've identified the root cause of the DECIMAL(38, 9) type issue:

  1. The breaking change was introduced in the BigQuery type converter (bigquery.py) as part of the SQLAlchemy 2.0 upgrade in PR #396:

    Before (v0.17.8):

    def to_sql_type(self, sql_type):
        return sql_type.__class__.__name__  # Returns string type names
    

    After (v0.17.9):

    def to_sql_type(self, sql_type):
        return sql_type  # Returns raw SQLAlchemy type objects
    
  2. This change affects how decimal types are handled:

    • In v0.17.8, DECIMAL types were converted to string names that BigQuery could understand
    • In v0.17.9, the raw SQLAlchemy DECIMAL type object is passed through without conversion
    • BigQuery expects specific type names for decimal/numeric types, which is why we're seeing the 'Invalid value for type: DECIMAL(38, 9)' error

The issue occurs because BigQuery doesn't directly support SQLAlchemy DECIMAL types in this format - they need to be properly mapped to BigQuery-compatible NUMERIC types.

A fix would involve updating the BigQueryTypeConverter to properly handle DECIMAL types by converting them to BigQuery's NUMERIC type with appropriate precision and scale parameters.

@andreibaragan - I'm asking Devin.ai to follow-up with a PR. While I don't have a lot of time to pick up myself right now, I think the AI may do a decent job, since this seems like a very discreet/testable workflow.

aaronsteers avatar Jan 24 '25 22:01 aaronsteers

I have started an attempt to fix here:

  • https://github.com/airbytehq/PyAirbyte/pull/582

Hi @aaronsteers, I'm having the same issue. Will this problem be solved soon?

Here are my env versions:

.venv➜  Airbyte pip3 freeze
airbyte==0.23.0
airbyte-api==0.52.2
airbyte-cdk==6.28.0
airbyte_protocol_models_dataclasses==0.14.2
airbyte_protocol_models_pdv2==0.13.1

julioyildo avatar Feb 13 '25 15:02 julioyildo

Hi @julioyildo, @aaronsteers,

I find a quick fix (but i don't know if this will be a permanent one)

In this file, you can add...

# .venv/lib/python3.10/site-packages/airbyte/_processors/sql/bigquery.py

# Comprehensive handling for numeric types
if isinstance(sql_type, (sqlalchemy.types.DECIMAL, sqlalchemy.types.NUMERIC)):
    # Force to BigQuery's FLOAT64 
     return "FLOAT64"

... so that it forces BigQuery to create a FLOAT64 for DECIMAL and NUMERIC types

You have this at the end :

class BigQueryTypeConverter(SQLTypeConverter):
    """A class to convert types for BigQuery with enhanced type handling."""

    @classmethod
    def get_string_type(cls) -> sqlalchemy.types.TypeEngine:
        """Return the string type for BigQuery."""
        return cast("sqlalchemy.types.TypeEngine", "String")  # BigQuery uses STRING for all strings

    @overrides
    def to_sql_type(
        self,
        json_schema_property_def: dict[str, str | dict | list],
    ) -> sqlalchemy.types.TypeEngine:
        """Convert a value to a SQL type with comprehensive type handling."""
        sql_type = super().to_sql_type(json_schema_property_def)
        
        # Handle specific type conversions for BigQuery
        if isinstance(sql_type, sqlalchemy.types.VARCHAR):
            return self.get_string_type()
        
        if isinstance(sql_type, sqlalchemy.types.BIGINT):
            return sqlalchemy_types.Integer()  # All integers are 64-bit in BigQuery
        
        # Comprehensive handling for numeric types
        if isinstance(sql_type, (sqlalchemy.types.DECIMAL, sqlalchemy.types.NUMERIC)):
            # Force to BigQuery's FLOAT64 to avoid precision errors
            return "FLOAT64"
        
        return sql_type

It fixed the error for me.

MartinPrejean avatar Mar 20 '25 13:03 MartinPrejean

bump, also having this issue with these packages

"airbyte>=0.24.2",
"airbyte-source-hubspot>=4.6.0rc1",
"google-cloud-bigquery-storage>=2.30.0",

@MartinPrejean your fix worked, thanks!

aryzle avatar Apr 09 '25 21:04 aryzle

@MartinPrejean fix works. Thanks.

dsolito avatar Apr 29 '25 22:04 dsolito

I am also experiencing this.

nickolasclarke avatar Jul 16 '25 04:07 nickolasclarke

~When I applied the fix from above, I got the following error when I attempted to load:~

DatabaseError(google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/stardust-prod-5a598/queries?prettyPrint=false: Value of type FLOAT64 
cannot be assigned to ae_total_app_sessions, which has type NUMERIC at [28:31]

~which seems pretty circular to the original error.~

edit: This was due to the table already having been created initially, expecting the NUMERIC type. I had to delete the empty table and re-run, and this solution then worked. 🤦🏼

nickolasclarke avatar Jul 16 '25 05:07 nickolasclarke