airbyte
airbyte copied to clipboard
🐛 Google Ads to Snowflake: Multi-byte character issue
Environment
- Airbyte version: 0.40.10
- OS Version / Instance: AWS EC2 and Mac OS
- Deployment: Docker
- Source Connector and version: Google Ads 0.2.5
- Destination Connector and version: Snowflake
- Step where error happened: Setup new connection
Current Behavior
ad.responsive_search_ad.headlines and ad.responsive_search_ad.descriptions seem to be encoded as ASCII characters, so they have problems with multi-byte characters. As a result, these values are not readable when we check the Snowflake to see the synced data.
Original text in google ads
テレビCMで話題
Text being returned
\343\203\206\343\203\254\343\203\223CM\343\201\247\350\251\261\351\241\214
I confirmed that this happens if we use v11 google-ads client library. When I use v12 (latest), no issue happened.
Expected Behavior
The data should be encoded properly so that humans can read. I think update google-ads client library to v12 will go well. (However there were some schema changes so it's not that easy to update.)
Logs
No log to share.
Are you willing to submit a PR?
No
Thanks for reporting the issue @ryuhei-okuno and already giving the path to solution.
Created an oncall issue: https://github.com/airbytehq/oncall/issues/1223
Moving this to Destinations team backlog based on the comments in OC issue. I am not entirely convinced this is a destinations issue at this time. There are several places where Character encoding could have been changed:
- when source connector encodes AirbyteMessages
- when platform passes the messages to destination
- when writing data to staging buffers
- when loading data into snowflake
- when moving data from tmp table to _raw table (least likely)
@grishick This is not a destination issue, and has nothing to do with Snowflake. Please update google-ads client library to v12. The current version used in the connector has a bug encoding some streams with the wrong character set. We have been successful in creating a custom source connector based on the Google Ads connector with the new library and running it without garbled text.
@sherifnada based on the above comment - who should take this issue?
@amrael thanks for the helpful context.
@YowanR FYI this is assigned to the API sources team.
Thanks, everyone! Adding this to our backlog and will tackle it as soon as we can get it prioritize. cc @lazebnyi @YuliiaNahoha -- this seems like a relatively straight-forward issue to fix. Let's add this to our next sprint
@amrael thanks for context. Can you provide us for which characters you found wrong encoding in Google Ads, because for テレビCMで話題
all working good.
cc @YowanR
Please take a look at the Current behavior section of the PR. Particular streams such as ad.responsive_search_ad.headlines
are only affected by this bug and it does not depend on specific characters. It happens all the time.
@artem1205 Can you take a look, please?
Taking a look
fixed and deployed in https://github.com/airbytehq/airbyte/pull/21705
Thanks! But the PR doesn't look enough to fix the issue. As @ryuhei-okuno wrote in the "Current Behavior" section, we need to update the API version to v12. The diff below is what we use temporarily.
diff --git a/airbyte-integrations/connectors/source-google-ads/setup.py b/airbyte-integrations/connectors/source-google-ads/setup.py
index fa702d2f09..93c5a7beac 100644
--- a/airbyte-integrations/connectors/source-google-ads/setup.py
+++ b/airbyte-integrations/connectors/source-google-ads/setup.py
@@ -7,7 +7,7 @@ from setuptools import find_packages, setup
# pin protobuf==3.20.0 as other versions may cause problems on different architectures
# (see https://github.com/airbytehq/airbyte/issues/13580)
-MAIN_REQUIREMENTS = ["airbyte-cdk>=0.2.2", "google-ads==17.0.0", "protobuf==3.20.0", "pendulum"]
+MAIN_REQUIREMENTS = ["airbyte-cdk>=0.2.2", "google-ads==19.0.0", "protobuf==4.21.5", "pendulum"]
TEST_REQUIREMENTS = ["pytest~=6.1", "pytest-mock", "freezegun", "requests-mock"]
diff --git a/airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py b/airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
index 7441708906..88b861907a 100644
--- a/airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
+++ b/airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
@@ -8,7 +8,7 @@ from typing import Any, Iterator, List, Mapping, MutableMapping
import pendulum
from google.ads.googleads.client import GoogleAdsClient
-from google.ads.googleads.v11.services.types.google_ads_service import GoogleAdsRow, SearchGoogleAdsResponse
+from google.ads.googleads.v12.services.types.google_ads_service import GoogleAdsRow, SearchGoogleAdsResponse
from proto.marshal.collections import Repeated, RepeatedComposite
REPORT_MAPPING = {
@@ -30,7 +30,7 @@ REPORT_MAPPING = {
"geographic_report": "geographic_view",
"keyword_report": "keyword_view",
}
-API_VERSION = "v11"
+API_VERSION = "v12"
class GoogleAds:
diff --git a/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ad_report.json b/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ad_report.json
index 6ab5d2c66e..65b8218331 100644
--- a/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ad_report.json
+++ b/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ad_report.json
@@ -222,15 +222,6 @@
"ad_group_ad.ad.legacy_responsive_display_ad.format_setting": {
"type": ["null", "string"]
},
- "ad_group_ad.ad.gmail_ad.header_image": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.teaser.logo_image": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image": {
- "type": ["null", "string"]
- },
"metrics.gmail_forwards": {
"type": ["null", "integer"]
},
@@ -240,15 +231,6 @@
"metrics.gmail_secondary_clicks": {
"type": ["null", "integer"]
},
- "ad_group_ad.ad.gmail_ad.teaser.business_name": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.teaser.description": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.teaser.headline": {
- "type": ["null", "string"]
- },
"ad_group_ad.ad.text_ad.headline": {
"type": ["null", "string"]
},
@@ -294,18 +276,6 @@
"ad_group_ad.ad.legacy_responsive_display_ad.main_color": {
"type": ["null", "string"]
},
- "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.text": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.text_color": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image_headline": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image_description": {
- "type": ["null", "string"]
- },
"segments.month": {
"type": ["null", "string"]
},
diff --git a/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ads.json b/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ads.json
index a06960e32e..58b73e318e 100644
--- a/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ads.json
+++ b/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ads.json
@@ -164,51 +164,6 @@
"type": "string"
}
},
- "ad_group_ad.ad.gmail_ad.header_image": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image_description": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.text": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.text_color": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.url_collection_id": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.marketing_image_headline": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.product_images": {
- "type": ["null", "array"],
- "items": {
- "type": "string"
- }
- },
- "ad_group_ad.ad.gmail_ad.product_videos": {
- "type": ["null", "array"],
- "items": {
- "type": "string"
- }
- },
- "ad_group_ad.ad.gmail_ad.teaser.business_name": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.teaser.description": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.teaser.headline": {
- "type": ["null", "string"]
- },
- "ad_group_ad.ad.gmail_ad.teaser.logo_image": {
- "type": ["null", "string"]
- },
"ad_group_ad.ad.hotel_ad": {
"type": ["null", "string"]
},
diff --git a/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py b/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py
index 998519e9bd..72e92447e1 100644
--- a/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py
+++ b/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py
@@ -10,9 +10,9 @@ from airbyte_cdk.models import SyncMode
from airbyte_cdk.sources.streams import IncrementalMixin, Stream
from airbyte_cdk.sources.utils.transform import TransformConfig, TypeTransformer
from google.ads.googleads.errors import GoogleAdsException
-from google.ads.googleads.v11.errors.types.authorization_error import AuthorizationErrorEnum
-from google.ads.googleads.v11.errors.types.request_error import RequestErrorEnum
-from google.ads.googleads.v11.services.services.google_ads_service.pagers import SearchPager
+from google.ads.googleads.v12.errors.types.authorization_error import AuthorizationErrorEnum
+from google.ads.googleads.v12.errors.types.request_error import RequestErrorEnum
+from google.ads.googleads.v12.services.services.google_ads_service.pagers import SearchPager
from .google_ads import GoogleAds
from .models import Customer
diff --git a/airbyte-integrations/connectors/source-google-ads/unit_tests/common.py b/airbyte-integrations/connectors/source-google-ads/unit_tests/common.py
index 66824c90d6..f7174f2b52 100644
--- a/airbyte-integrations/connectors/source-google-ads/unit_tests/common.py
+++ b/airbyte-integrations/connectors/source-google-ads/unit_tests/common.py
@@ -5,7 +5,7 @@
import json
from google.ads.googleads.errors import GoogleAdsException
-from google.ads.googleads.v11 import GoogleAdsFailure
+from google.ads.googleads.v12 import GoogleAdsFailure
class MockSearchRequest:
diff --git a/airbyte-integrations/connectors/source-google-ads/unit_tests/test_source.py b/airbyte-integrations/connectors/source-google-ads/unit_tests/test_source.py
index 072e2a0626..161dca86c9 100644
--- a/airbyte-integrations/connectors/source-google-ads/unit_tests/test_source.py
+++ b/airbyte-integrations/connectors/source-google-ads/unit_tests/test_source.py
@@ -9,7 +9,7 @@ import pytest
from airbyte_cdk import AirbyteLogger
from freezegun import freeze_time
from google.ads.googleads.errors import GoogleAdsException
-from google.ads.googleads.v11.errors.types.authorization_error import AuthorizationErrorEnum
+from google.ads.googleads.v12.errors.types.authorization_error import AuthorizationErrorEnum
from pendulum import today
from source_google_ads.custom_query_stream import CustomQuery
from source_google_ads.google_ads import GoogleAds
diff --git a/airbyte-integrations/connectors/source-google-ads/unit_tests/test_streams.py b/airbyte-integrations/connectors/source-google-ads/unit_tests/test_streams.py
index d20ac79a62..17afb6c9d2 100644
--- a/airbyte-integrations/connectors/source-google-ads/unit_tests/test_streams.py
+++ b/airbyte-integrations/connectors/source-google-ads/unit_tests/test_streams.py
@@ -7,8 +7,8 @@ from unittest.mock import Mock
import pytest
from airbyte_cdk.models import SyncMode
from google.ads.googleads.errors import GoogleAdsException
-from google.ads.googleads.v11.errors.types.errors import ErrorCode, GoogleAdsError, GoogleAdsFailure
-from google.ads.googleads.v11.errors.types.request_error import RequestErrorEnum
+from google.ads.googleads.v12.errors.types.errors import ErrorCode, GoogleAdsError, GoogleAdsFailure
+from google.ads.googleads.v12.errors.types.request_error import RequestErrorEnum
from grpc import RpcError
from source_google_ads.google_ads import GoogleAds
from source_google_ads.streams import ClickView
@amrael Thanks for response. This is decoding problem has been fixed in new version of google-ads
package. We don't need change API version to fix this issue.
@ryuhei-okuno You reproduced the encoding issue with the v11 API requests directly as far as I can remember. Can you confirm whether the sdk upgrade is enough or not?
Okay sure, let me check the current behavior.
@amrael I've confirmed that using new version already deployed (version 0.2.9) solved this problem.
@lazebnyi @artem1205 Could you please let me ask one query?
In https://github.com/airbytehq/airbyte/pull/21705, you didn't change the version of google ads API directly being specified to use the old version v11 in the connector. For example this line is specifying the v11. I assumed that we also need to change this dependency as well but don't we need to do that? If not, could you please tell me why this problem was solved only by updating the version in setup.py? Anyways, really thank you for your fix.
@ryuhei-okuno,
Issue was solved by updating the google-ads
python package and protobuf
as dependent package, as the problem was inside transfer layer
(probably somewhere in protobuf
decoding part of the code).
We don' t need to update the google API version to v12, as it introduces many changes to existing schemas and may lead to some problems with backward compatibility.
Thank you for the explanation, understood. No concern from our side, so it's okay to close this issue. Thanks.