π¨ π¨ β¨ Source Tik Tok Marketing: Migration to Low-Code
What
resolved: https://github.com/airbytehq/airbyte-internal-issues/issues/7824
How
Migrated source to use low-code cdk instead of python cdk. Main changes:
- State: Previously all incremental streams used incorrect state without partition. On low-code cdk all incremental streams use per partition state.
- Lifetime reports: Previously implementation used
lifetime=trueas request param, which is deprecated on API v1.3. Now lifetime reports usequery_lifetime=true, with this param start_date and end_date should not be provided. Exception: advertiser_lifetime_report: API v1.3 doesn't allow query_lifetime=true` with advertiser reports, so this stream was implemented exactly as in py version with start_date and end_date query params(range >=365d) - Advertiser Ids stream: schema was changed to use advertiser_id as type of stream to be up to date with API docs.
- Discover for configs with granularity: In py implementation were missing streams(campaigns_audience_reports, ad_group_audience_reports_by_platform, ad_group_audience_reports_by_country, ads_audience_reports_by_country, advertisers_audience_reports_by_country, campaigns_audience_reports_by_platform, advertisers_audience_reports_by_platform, ads_audience_reports_by_platform, ads_audience_reports_by_province), which users with provided granularity actually can use but streams method didn't return them. For configs with granularity source removes granularity from stream name as it was previously named.
Review guide
User Impact
Breaking change users will need to follow migration guide for affected streams.
Can this PR be safely reverted and rolled back?
Breaking change due to changes in schema and state format.
- [ ] YES π
- [x] NO β
The latest updates on your projects. Learn more about Vercel for Git βοΈ
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| airbyte-docs | β Ready (Inspect) | Visit Preview | π¬ Add feedback | Jul 1, 2024 0:37am |
Regression test results: test_catalog_are_the_same [failed] β updated advertiser_id: integer - string. (breaking change described in the docs)
TestDataIntegrity.test_record_schema_match_without_state [failed] - Value of root['properties']['budget']['type'] changed from "integer" to "number". Value of root['properties']['roas_bid']['type'] changed from "integer" to "number". (same error for all fields with type number in schema but actual type is integer). Both versions have type number, but default type transformer was added in low code version so 0 value is changed to 0.0. For db with transformations(e.g. BigQuery) itβs not a breaking change as destination already converts this data values to a number. Streams are in a list of breaking changes affected by state changes, so users will do refresh&clear anyway. This change from 0 to 0.0 occured due to added Default schema normalization in low code to be compatible with stream schemas that was added for api v1.2.0 and in v1.3.0 some fields have new type. For example *_id was changed from integer to string and stream schemas for v.1.2.0 use integer as type.
TestDataIntegrity.test_all_records_are_the_same_without_state [failed] - Same differences with integer/number as above.
Read URLs: some requests in py version due to HttpAvailabilityStrategy
PS: Reviewer can ask me to send the full html report in slack dm. Regression tests were running locally as I needed to change start date in config and chose testing without state due to breaking changes.
@maxi297 Please take a look at my regression tests results here https://github.com/airbytehq/airbyte/pull/38316#issuecomment-2161072746
The most recent version has record["metrics"]["cost_per_secondary_goal_result"] == None while the new one has a value of -. There might be other differences
This was fixed by adding DefaultTransformation in low code. But I double check. This is the reason that we have differences in records for integer values that described as number in stream schema.
Also double check page size failure. Thanks.
@maxi297
- Fixed
record["metrics"]["cost_per_secondary_goal_result"] == None while the new one has a value of -for all report based streams by addingTransformEmptyMetricscustom component. Thanks for pointing it. -
page_size query param is not passed anymore, I guess you just didn't see it in requests because it is now in the end of request(due to order of params in the definition of requester). I double checked it. All request params are identical with py version, except order of params, but it doesn't affect a response. Request looks like:https://business-api.tiktok.com/open_api/v1.3/report/integrated/get/?service_type=AUCTION&report_type=BASIC&data_level=AUCTION_ADGROUP&dimensions=[dimensions]&metrics=[metrics]&start_date=2022-09-30&end_date=2022-10-29&page_size=1000&advertiser_id=7001035076276387841.