airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

🐛 Google Ads to Snowflake: Multi-byte character issue

Open ryuhei-okuno opened this issue 2 years ago • 17 comments

Environment

  • Airbyte version: 0.40.10
  • OS Version / Instance: AWS EC2 and Mac OS
  • Deployment: Docker
  • Source Connector and version: Google Ads 0.2.5
  • Destination Connector and version: Snowflake
  • Step where error happened: Setup new connection

Current Behavior

ad.responsive_search_ad.headlines and ad.responsive_search_ad.descriptions seem to be encoded as ASCII characters, so they have problems with multi-byte characters. As a result, these values are not readable when we check the Snowflake to see the synced data.

Original text in google ads

テレビCMで話題

Text being returned

\343\203\206\343\203\254\343\203\223CM\343\201\247\350\251\261\351\241\214

I confirmed that this happens if we use v11 google-ads client library. When I use v12 (latest), no issue happened.

Expected Behavior

The data should be encoded properly so that humans can read. I think update google-ads client library to v12 will go well. (However there were some schema changes so it's not that easy to update.)

Logs

No log to share.

Are you willing to submit a PR?

No

ryuhei-okuno avatar Dec 20 '22 12:12 ryuhei-okuno

Thanks for reporting the issue @ryuhei-okuno and already giving the path to solution.

marcosmarxm avatar Dec 20 '22 15:12 marcosmarxm

Created an oncall issue: https://github.com/airbytehq/oncall/issues/1223

marcosmarxm avatar Dec 20 '22 15:12 marcosmarxm

Moving this to Destinations team backlog based on the comments in OC issue. I am not entirely convinced this is a destinations issue at this time. There are several places where Character encoding could have been changed:

  • when source connector encodes AirbyteMessages
  • when platform passes the messages to destination
  • when writing data to staging buffers
  • when loading data into snowflake
  • when moving data from tmp table to _raw table (least likely)

grishick avatar Jan 17 '23 18:01 grishick

@grishick This is not a destination issue, and has nothing to do with Snowflake. Please update google-ads client library to v12. The current version used in the connector has a bug encoding some streams with the wrong character set. We have been successful in creating a custom source connector based on the Google Ads connector with the new library and running it without garbled text.

amrael avatar Jan 17 '23 23:01 amrael

@sherifnada based on the above comment - who should take this issue?

grishick avatar Jan 18 '23 20:01 grishick

@amrael thanks for the helpful context.

@YowanR FYI this is assigned to the API sources team.

sherifnada avatar Jan 18 '23 22:01 sherifnada

Thanks, everyone! Adding this to our backlog and will tackle it as soon as we can get it prioritize. cc @lazebnyi @YuliiaNahoha -- this seems like a relatively straight-forward issue to fix. Let's add this to our next sprint

YowanR avatar Jan 18 '23 22:01 YowanR

@amrael thanks for context. Can you provide us for which characters you found wrong encoding in Google Ads, because for テレビCMで話題 all working good.

cc @YowanR

lazebnyi avatar Jan 19 '23 19:01 lazebnyi

Please take a look at the Current behavior section of the PR. Particular streams such as ad.responsive_search_ad.headlines are only affected by this bug and it does not depend on specific characters. It happens all the time.

amrael avatar Jan 19 '23 23:01 amrael

@artem1205 Can you take a look, please?

lazebnyi avatar Jan 20 '23 13:01 lazebnyi

Taking a look

artem1205 avatar Jan 23 '23 12:01 artem1205

fixed and deployed in https://github.com/airbytehq/airbyte/pull/21705

artem1205 avatar Jan 23 '23 23:01 artem1205

Thanks! But the PR doesn't look enough to fix the issue. As @ryuhei-okuno wrote in the "Current Behavior" section, we need to update the API version to v12. The diff below is what we use temporarily.

diff --git a/airbyte-integrations/connectors/source-google-ads/setup.py b/airbyte-integrations/connectors/source-google-ads/setup.py
index fa702d2f09..93c5a7beac 100644
--- a/airbyte-integrations/connectors/source-google-ads/setup.py
+++ b/airbyte-integrations/connectors/source-google-ads/setup.py
@@ -7,7 +7,7 @@ from setuptools import find_packages, setup
 
 # pin protobuf==3.20.0 as other versions may cause problems on different architectures
 # (see https://github.com/airbytehq/airbyte/issues/13580)
-MAIN_REQUIREMENTS = ["airbyte-cdk>=0.2.2", "google-ads==17.0.0", "protobuf==3.20.0", "pendulum"]
+MAIN_REQUIREMENTS = ["airbyte-cdk>=0.2.2", "google-ads==19.0.0", "protobuf==4.21.5", "pendulum"]
 
 TEST_REQUIREMENTS = ["pytest~=6.1", "pytest-mock", "freezegun", "requests-mock"]
 
diff --git a/airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py b/airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
index 7441708906..88b861907a 100644
--- a/airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
+++ b/airbyte-integrations/connectors/source-google-ads/source_google_ads/google_ads.py
@@ -8,7 +8,7 @@ from typing import Any, Iterator, List, Mapping, MutableMapping
 
 import pendulum
 from google.ads.googleads.client import GoogleAdsClient
-from google.ads.googleads.v11.services.types.google_ads_service import GoogleAdsRow, SearchGoogleAdsResponse
+from google.ads.googleads.v12.services.types.google_ads_service import GoogleAdsRow, SearchGoogleAdsResponse
 from proto.marshal.collections import Repeated, RepeatedComposite
 
 REPORT_MAPPING = {
@@ -30,7 +30,7 @@ REPORT_MAPPING = {
     "geographic_report": "geographic_view",
     "keyword_report": "keyword_view",
 }
-API_VERSION = "v11"
+API_VERSION = "v12"
 
 
 class GoogleAds:
diff --git a/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ad_report.json b/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ad_report.json
index 6ab5d2c66e..65b8218331 100644
--- a/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ad_report.json
+++ b/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ad_report.json
@@ -222,15 +222,6 @@
     "ad_group_ad.ad.legacy_responsive_display_ad.format_setting": {
       "type": ["null", "string"]
     },
-    "ad_group_ad.ad.gmail_ad.header_image": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.teaser.logo_image": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image": {
-      "type": ["null", "string"]
-    },
     "metrics.gmail_forwards": {
       "type": ["null", "integer"]
     },
@@ -240,15 +231,6 @@
     "metrics.gmail_secondary_clicks": {
       "type": ["null", "integer"]
     },
-    "ad_group_ad.ad.gmail_ad.teaser.business_name": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.teaser.description": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.teaser.headline": {
-      "type": ["null", "string"]
-    },
     "ad_group_ad.ad.text_ad.headline": {
       "type": ["null", "string"]
     },
@@ -294,18 +276,6 @@
     "ad_group_ad.ad.legacy_responsive_display_ad.main_color": {
       "type": ["null", "string"]
     },
-    "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.text": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.text_color": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image_headline": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image_description": {
-      "type": ["null", "string"]
-    },
     "segments.month": {
       "type": ["null", "string"]
     },
diff --git a/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ads.json b/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ads.json
index a06960e32e..58b73e318e 100644
--- a/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ads.json
+++ b/airbyte-integrations/connectors/source-google-ads/source_google_ads/schemas/ad_group_ads.json
@@ -164,51 +164,6 @@
         "type": "string"
       }
     },
-    "ad_group_ad.ad.gmail_ad.header_image": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image_description": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.text": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.text_color": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image_display_call_to_action.url_collection_id": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.marketing_image_headline": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.product_images": {
-      "type": ["null", "array"],
-      "items": {
-        "type": "string"
-      }
-    },
-    "ad_group_ad.ad.gmail_ad.product_videos": {
-      "type": ["null", "array"],
-      "items": {
-        "type": "string"
-      }
-    },
-    "ad_group_ad.ad.gmail_ad.teaser.business_name": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.teaser.description": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.teaser.headline": {
-      "type": ["null", "string"]
-    },
-    "ad_group_ad.ad.gmail_ad.teaser.logo_image": {
-      "type": ["null", "string"]
-    },
     "ad_group_ad.ad.hotel_ad": {
       "type": ["null", "string"]
     },
diff --git a/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py b/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py
index 998519e9bd..72e92447e1 100644
--- a/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py
+++ b/airbyte-integrations/connectors/source-google-ads/source_google_ads/streams.py
@@ -10,9 +10,9 @@ from airbyte_cdk.models import SyncMode
 from airbyte_cdk.sources.streams import IncrementalMixin, Stream
 from airbyte_cdk.sources.utils.transform import TransformConfig, TypeTransformer
 from google.ads.googleads.errors import GoogleAdsException
-from google.ads.googleads.v11.errors.types.authorization_error import AuthorizationErrorEnum
-from google.ads.googleads.v11.errors.types.request_error import RequestErrorEnum
-from google.ads.googleads.v11.services.services.google_ads_service.pagers import SearchPager
+from google.ads.googleads.v12.errors.types.authorization_error import AuthorizationErrorEnum
+from google.ads.googleads.v12.errors.types.request_error import RequestErrorEnum
+from google.ads.googleads.v12.services.services.google_ads_service.pagers import SearchPager
 
 from .google_ads import GoogleAds
 from .models import Customer
diff --git a/airbyte-integrations/connectors/source-google-ads/unit_tests/common.py b/airbyte-integrations/connectors/source-google-ads/unit_tests/common.py
index 66824c90d6..f7174f2b52 100644
--- a/airbyte-integrations/connectors/source-google-ads/unit_tests/common.py
+++ b/airbyte-integrations/connectors/source-google-ads/unit_tests/common.py
@@ -5,7 +5,7 @@
 import json
 
 from google.ads.googleads.errors import GoogleAdsException
-from google.ads.googleads.v11 import GoogleAdsFailure
+from google.ads.googleads.v12 import GoogleAdsFailure
 
 
 class MockSearchRequest:
diff --git a/airbyte-integrations/connectors/source-google-ads/unit_tests/test_source.py b/airbyte-integrations/connectors/source-google-ads/unit_tests/test_source.py
index 072e2a0626..161dca86c9 100644
--- a/airbyte-integrations/connectors/source-google-ads/unit_tests/test_source.py
+++ b/airbyte-integrations/connectors/source-google-ads/unit_tests/test_source.py
@@ -9,7 +9,7 @@ import pytest
 from airbyte_cdk import AirbyteLogger
 from freezegun import freeze_time
 from google.ads.googleads.errors import GoogleAdsException
-from google.ads.googleads.v11.errors.types.authorization_error import AuthorizationErrorEnum
+from google.ads.googleads.v12.errors.types.authorization_error import AuthorizationErrorEnum
 from pendulum import today
 from source_google_ads.custom_query_stream import CustomQuery
 from source_google_ads.google_ads import GoogleAds
diff --git a/airbyte-integrations/connectors/source-google-ads/unit_tests/test_streams.py b/airbyte-integrations/connectors/source-google-ads/unit_tests/test_streams.py
index d20ac79a62..17afb6c9d2 100644
--- a/airbyte-integrations/connectors/source-google-ads/unit_tests/test_streams.py
+++ b/airbyte-integrations/connectors/source-google-ads/unit_tests/test_streams.py
@@ -7,8 +7,8 @@ from unittest.mock import Mock
 import pytest
 from airbyte_cdk.models import SyncMode
 from google.ads.googleads.errors import GoogleAdsException
-from google.ads.googleads.v11.errors.types.errors import ErrorCode, GoogleAdsError, GoogleAdsFailure
-from google.ads.googleads.v11.errors.types.request_error import RequestErrorEnum
+from google.ads.googleads.v12.errors.types.errors import ErrorCode, GoogleAdsError, GoogleAdsFailure
+from google.ads.googleads.v12.errors.types.request_error import RequestErrorEnum
 from grpc import RpcError
 from source_google_ads.google_ads import GoogleAds
 from source_google_ads.streams import ClickView

amrael avatar Jan 24 '23 00:01 amrael

@amrael Thanks for response. This is decoding problem has been fixed in new version of google-ads package. We don't need change API version to fix this issue.

lazebnyi avatar Jan 24 '23 12:01 lazebnyi

@ryuhei-okuno You reproduced the encoding issue with the v11 API requests directly as far as I can remember. Can you confirm whether the sdk upgrade is enough or not?

amrael avatar Jan 24 '23 12:01 amrael

Okay sure, let me check the current behavior.

ryuhei-okuno avatar Jan 25 '23 03:01 ryuhei-okuno

@amrael I've confirmed that using new version already deployed (version 0.2.9) solved this problem.

@lazebnyi @artem1205 Could you please let me ask one query?

In https://github.com/airbytehq/airbyte/pull/21705, you didn't change the version of google ads API directly being specified to use the old version v11 in the connector. For example this line is specifying the v11. I assumed that we also need to change this dependency as well but don't we need to do that? If not, could you please tell me why this problem was solved only by updating the version in setup.py? Anyways, really thank you for your fix.

ryuhei-okuno avatar Jan 28 '23 04:01 ryuhei-okuno

@ryuhei-okuno,

Issue was solved by updating the google-ads python package and protobuf as dependent package, as the problem was inside transfer layer (probably somewhere in protobuf decoding part of the code). We don' t need to update the google API version to v12, as it introduces many changes to existing schemas and may lead to some problems with backward compatibility.

artem1205 avatar Jan 29 '23 12:01 artem1205

Thank you for the explanation, understood. No concern from our side, so it's okay to close this issue. Thanks.

ryuhei-okuno avatar Feb 01 '23 12:02 ryuhei-okuno