🎉 Source Shopify: migrate `products`, `product images` and `product variants` to `GraphQL BULK`
What
Resolving:
- https://github.com/airbytehq/airbyte-internal-issues/issues/7353
- https://github.com/airbytehq/airbyte-internal-issues/issues/7354
- https://github.com/airbytehq/airbyte-internal-issues/issues/7355
How
- migrated
Products,Product ImagesandProduct VariantstoShopify GQL BULK - removed the parent dependency (Product) for the
product imagesandproduct variantsstreams, they should be independent to have the proper migration process - extended the
query.py(BULK Queries module) with 3 more classes forProduct,ProductImageandProductVariantqueries - changed the references for the
stream.pyto make the migrated streams inherit from theIncrementalShopifyGraphQlBulkStream - added 3 test cases to cover the change
- fixed the tests that were affected by the change
User Impact
There is a BREAKING CHANGE for the product variants stream STATE:
Before:
"id": 112233
Now:
"updated_at": "2024-01-01T01:01:01+00:00"
Can this PR be safely reverted and rolled back?
- [X] YES 💚,
Because the old STATE for id based values is still there for the STATE object, we can safely roll back when something is wrong.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| airbyte-docs | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | May 14, 2024 8:38am |
@katmarkham Could you please take a look at the user-facing message about the breaking change? Thank you.
- There are records missing in
product_imagesduring CATs execution. Is it expected?
Yes, because previously, it was based on the Id cursor, and now product_images uses the updated_at cursor, which captures the records using the updated_at field. (When the parent dependency (Product) was present, we fetched all the products and filtered not by updated_at, but by the id)
Should we run a couple of live-testing tests on that to ensure there are no regressions for users as well? It seems like there are a couple of differences we might want to address or be explicit about it
These will be added in a separate PR. I don't have a clear path to implementing live tests for the Shopify source. For now, the unit_tests cover the bulk flow, and no regressions are faced. I've extended the unit_tests to cover a couple of additional cases we want to be retryable and removed the patch logic.
Update the migration-guide with the Products stream changes as well.
Some intermediate Live Tests results:
## Tested Connection IDs:
1) ID: `80be7c99-7a87-41e9-bae9-b0e61e7be4c4`
2) ID: `89ed6b0d-3cdf-46e1-bbee-35061c038471`
3) ID: `58c561e0-c3c8-48cc-94dd-0097fc8de403`
Results
ID: 80be7c99-7a87-41e9-bae9-b0e61e7be4c4
GH Action
Report
Artifact
ID: 89ed6b0d-3cdf-46e1-bbee-35061c038471
GH Action
Report
Artifact
ID: 58c561e0-c3c8-48cc-94dd-0097fc8de403
GH Action
Report
Artifact
cc @maxi297
UPDATED: non-relevant, will make another batch of tests using 2 connection_id / affected stream.
The pre-release version has been published successfully: 2.1.0-dev.805cd2979d
Action run: https://github.com/airbytehq/airbyte/actions/runs/9014252053
Live Tests results:
Tested Connection IDs:
1) `Products` stream:
- `1adee9d6-d8b1-4425-85a7-e13aebf7331e`
- `80be7c99-7a87-41e9-bae9-b0e61e7be4c4`
2) `Product Images` stream:
- `d82df6a0-1b94-4608-9e63-e9f0d57ec652`
- `2fa9f9ea-5a58-4628-b5f1-08314bfeacaf`
3) `Product Variants` stream:
- `81a5acbb-4374-497d-9af2-297ea8e7f6a7`
- `a7d4dcfe-39bb-4803-8e49-18ae07e838b2`
Results:
Products stream:
-
ID:
1adee9d6-d8b1-4425-85a7-e13aebf7331eGH Action Report Artifact -
ID:
80be7c99-7a87-41e9-bae9-b0e61e7be4c4GH Action Report Artifact
Product Images stream:
-
ID:
d82df6a0-1b94-4608-9e63-e9f0d57ec652GH Action Report Artifact -
ID:
2fa9f9ea-5a58-4628-b5f1-08314bfeacafGH Action Report Artifact
Product Variants stream:
-
ID:
81a5acbb-4374-497d-9af2-297ea8e7f6a7GH Action Report Artifact -
ID:
a7d4dcfe-39bb-4803-8e49-18ae07e838b2GH Action Report Artifact
cc @maxi297
I feel uneasy to accept those live testing results without explaining those gaps:
1adee9d6-d8b1-4425-85a7-e13aebf7331e and 80be7c99-7a87-41e9-bae9-b0e61e7be4c4
The HTTP requests are still all the same i.e. I still don't see any request to products.json in the control version which makes me wonder if the stream is tested at all... Can you explain why we don't see HTTP requests to /products.json
d82df6a0-1b94-4608-9e63-e9f0d57ec652
It says product_variants stream: records with primary key in target & control whose values differ but the diff does not highlight this change. I would also expect to see this discrepancy in 81a5acbb-4374-497d-9af2-297ea8e7f6a7 and a7d4dcfe-39bb-4803-8e49-18ae07e838b2. Do we know why this change is not picked up by the live testing?
Because of the inability to test using live-testing tool (permissions issues), the manual changes report was conducted.
CHANGES (REGRESSION) TEST PLAN
- Make common
CONFIG - Make common
CATALOGwith isolated target (affected) stream - Run-Test each affected stream against the
Test Criteria. Affected streams:Products,Product Images,Product Variants
COMMANDS:
FULL-REFRESH
## CONTROL
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:latest read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --debug > control.json
## TARGET
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --debug > target.json
INCREMENTAL
## CONTROL
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:latest read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --state /integration_tests/control_state.json --debug > control_inc.json
## TARGET
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --state /integration_tests/target_state.json --debug > target_inc.json
Test Criteria:
-
Outgoing API request example:
- 1.1)
CONTROL - 1.2)
TARGET - Explain the diff
- 1.1)
-
Sync mode:
Full-Refresh-
2.1)
Number of records:- 2.1.1)
CONTROL - 2.1.2)
TARGET - Explain the diff
- 2.1.1)
-
2.2)
STATEcomparison:- 2.2.1)
CONTROL - 2.2.2)
TARGET - Explain the diff
- 2.2.1)
-
2.3)
Missing records:- 2.3.1)
CONTROL - 2.3.2)
TARGET - Explain the diff
- 2.3.1)
-
-
Sync mode:
Incremental-
3.1)
Number of records:- 3.1.1)
CONTROL - 3.1.2)
TARGET - Explain the diff
- 3.1.1)
-
3.2)
STATEcomparison:- 3.2.1)
CONTROL - 3.2.2)
TARGET - Explain the diff
- 3.2.1)
-
3.3)
Missing records:- 3.3.1)
CONTROL - 3.3.2)
TARGET - Explain the diff
- 3.3.1)
-
-
Record example:- 4.1)
CONTROL - 4.2)
TARGET
- 4.1)
-
Schema changes:- 5.1) Added fields
- 5.2) Changes types
- 5.3) Removed fields
- Explain the diff
-
Is this a
breaking change:- [] Yes
- [] No
Test Results ( PII - free ):
cc @maxi297
@bazarnov @maxi297 quick question: since there are breaking changes, it should have been included in a 3.0 release no? I can't find how Airbyte Cloud handles connector upgrade, is this automated?
Hi @loris ! The breaking changes were introduced as part of 2.1.0. As this is a breaking change, the opt-in mechanism on cloud should kick in and you should see a prompt asking you to upgrade before 2024-06-10
@maxi297 Thanks for the clarification, I'm in discussion with cloud support because this is clearly not working as expected. In our production workspace, connections have been upgraded to 2.2 without any prompt (and thus broke our pipeline), while on our other workspaces, connections are still running Shopify 2.0 connector and we don't see any Upgrade prompt popping up.
@maxi297 Actually I might have found the issue:
2.1.0:
message:
"This upgrade changes the `Products`, `Product Images` and `Product Variants` streams to use `Shopify GraphQL BULK`.
More details here: https://github.com/airbytehq/airbyte/pull/37767."
upgradeDeadline: "2024-06-10"
scopedImpact:
- scopeType: stream
impactedScopes: ["product_variants"]
The breaking change message only states product_variants in the impactedScopes, while products was also clearly impacted. Our Airbyte connections only needed to sync products (as the variants where part of that stream too). But with the change, we lost the variant data and were not informed of the breaking change because the impactedScopes did not included products, does this seem right?
@loris This message included the breaking change that breaks the source sync operations only, the info about the data was re-arranged based on the API changes are not included in this message.
In the migration guide here: https://github.com/airbytehq/airbyte/blob/e9e6cf3778caf804bc5d433ccda1530de1373a6d/docs/integrations/sources/shopify-migrations.md which comes with this release, we included all the details about the actual data change we discovered. Have you had a chance to observe this in the UI?
Hope that answers your question and thank you for your feedback.
Very clear!
Have you had a chance to observe this in the UI?
This is the part which is confusing to me: No, I did not observe anything in the UI. I also have a workspace where Shopify connector are still running the 2.0 version and did not see any Upgrade banner. I have the email/webhook notifications set up, but nothing there too. We are waiting for a reply of Airbyte Cloud support and will let you know
This information will be helpful for us a well as maybe they is a gap we are not aware related to the opt-in mechanism. Please, keep us up to date on this one