airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

🚨 🚨 ✨ Source Tik Tok Marketing: Migration to Low-Code

Open darynaishchenko opened this issue 1 year ago β€’ 1 comments

What

resolved: https://github.com/airbytehq/airbyte-internal-issues/issues/7824

How

Migrated source to use low-code cdk instead of python cdk. Main changes:

  • State: Previously all incremental streams used incorrect state without partition. On low-code cdk all incremental streams use per partition state.
  • Lifetime reports: Previously implementation used lifetime=true as request param, which is deprecated on API v1.3. Now lifetime reports use query_lifetime=true, with this param start_date and end_date should not be provided. Exception: advertiser_lifetime_report: API v1.3 doesn't allow query_lifetime=true` with advertiser reports, so this stream was implemented exactly as in py version with start_date and end_date query params(range >=365d)
  • Advertiser Ids stream: schema was changed to use advertiser_id as type of stream to be up to date with API docs.
  • Discover for configs with granularity: In py implementation were missing streams(campaigns_audience_reports, ad_group_audience_reports_by_platform, ad_group_audience_reports_by_country, ads_audience_reports_by_country, advertisers_audience_reports_by_country, campaigns_audience_reports_by_platform, advertisers_audience_reports_by_platform, ads_audience_reports_by_platform, ads_audience_reports_by_province), which users with provided granularity actually can use but streams method didn't return them. For configs with granularity source removes granularity from stream name as it was previously named.

Review guide

User Impact

Breaking change users will need to follow migration guide for affected streams.

Can this PR be safely reverted and rolled back?

Breaking change due to changes in schema and state format.

  • [ ] YES πŸ’š
  • [x] NO ❌

darynaishchenko avatar May 17 '24 16:05 darynaishchenko

The latest updates on your projects. Learn more about Vercel for Git β†—οΈŽ

Name Status Preview Comments Updated (UTC)
airbyte-docs βœ… Ready (Inspect) Visit Preview πŸ’¬ Add feedback Jul 1, 2024 0:37am

vercel[bot] avatar May 17 '24 16:05 vercel[bot]

Regression test results: test_catalog_are_the_same [failed] – updated advertiser_id: integer - string. (breaking change described in the docs)

TestDataIntegrity.test_record_schema_match_without_state [failed] - Value of root['properties']['budget']['type'] changed from "integer" to "number". Value of root['properties']['roas_bid']['type'] changed from "integer" to "number". (same error for all fields with type number in schema but actual type is integer). Both versions have type number, but default type transformer was added in low code version so 0 value is changed to 0.0. For db with transformations(e.g. BigQuery) it’s not a breaking change as destination already converts this data values to a number. Streams are in a list of breaking changes affected by state changes, so users will do refresh&clear anyway. This change from 0 to 0.0 occured due to added Default schema normalization in low code to be compatible with stream schemas that was added for api v1.2.0 and in v1.3.0 some fields have new type. For example *_id was changed from integer to string and stream schemas for v.1.2.0 use integer as type.

TestDataIntegrity.test_all_records_are_the_same_without_state [failed] - Same differences with integer/number as above.

Read URLs: some requests in py version due to HttpAvailabilityStrategy

PS: Reviewer can ask me to send the full html report in slack dm. Regression tests were running locally as I needed to change start date in config and chose testing without state due to breaking changes.

darynaishchenko avatar Jun 11 '24 15:06 darynaishchenko

@maxi297 Please take a look at my regression tests results here https://github.com/airbytehq/airbyte/pull/38316#issuecomment-2161072746

The most recent version has record["metrics"]["cost_per_secondary_goal_result"] == None while the new one has a value of -. There might be other differences

This was fixed by adding DefaultTransformation in low code. But I double check. This is the reason that we have differences in records for integer values that described as number in stream schema.

Also double check page size failure. Thanks.

darynaishchenko avatar Jun 14 '24 17:06 darynaishchenko

@maxi297

  • Fixed record["metrics"]["cost_per_secondary_goal_result"] == None while the new one has a value of - for all report based streams by adding TransformEmptyMetrics custom component. Thanks for pointing it.
  • page_size query param is not passed anymore, I guess you just didn't see it in requests because it is now in the end of request(due to order of params in the definition of requester). I double checked it. All request params are identical with py version, except order of params, but it doesn't affect a response. Request looks like: https://business-api.tiktok.com/open_api/v1.3/report/integrated/get/?service_type=AUCTION&report_type=BASIC&data_level=AUCTION_ADGROUP&dimensions=[dimensions]&metrics=[metrics]&start_date=2022-09-30&end_date=2022-10-29&page_size=1000&advertiser_id=7001035076276387841.

darynaishchenko avatar Jun 17 '24 10:06 darynaishchenko