airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

🎉 Source Shopify: migrate `products`, `product images` and `product variants` to `GraphQL BULK`

Open bazarnov opened this issue 1 year ago • 2 comments

What

Resolving:

  • https://github.com/airbytehq/airbyte-internal-issues/issues/7353
  • https://github.com/airbytehq/airbyte-internal-issues/issues/7354
  • https://github.com/airbytehq/airbyte-internal-issues/issues/7355

How

  • migrated Products, Product Images and Product Variants to Shopify GQL BULK
  • removed the parent dependency (Product) for the product images and product variants streams, they should be independent to have the proper migration process
  • extended the query.py (BULK Queries module) with 3 more classes for Product, ProductImage and ProductVariant queries
  • changed the references for the stream.py to make the migrated streams inherit from the IncrementalShopifyGraphQlBulkStream
  • added 3 test cases to cover the change
  • fixed the tests that were affected by the change

User Impact

There is a BREAKING CHANGE for the product variants stream STATE:

Before:

"id": 112233

Now:

"updated_at": "2024-01-01T01:01:01+00:00"

Can this PR be safely reverted and rolled back?

  • [X] YES 💚,

Because the old STATE for id based values is still there for the STATE object, we can safely roll back when something is wrong.

bazarnov avatar May 02 '24 11:05 bazarnov

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 14, 2024 8:38am

vercel[bot] avatar May 02 '24 11:05 vercel[bot]

@katmarkham Could you please take a look at the user-facing message about the breaking change? Thank you.

bazarnov avatar May 02 '24 11:05 bazarnov

  • There are records missing in product_images during CATs execution. Is it expected?

Yes, because previously, it was based on the Id cursor, and now product_images uses the updated_at cursor, which captures the records using the updated_at field. (When the parent dependency (Product) was present, we fetched all the products and filtered not by updated_at, but by the id)

Should we run a couple of live-testing tests on that to ensure there are no regressions for users as well? It seems like there are a couple of differences we might want to address or be explicit about it

These will be added in a separate PR. I don't have a clear path to implementing live tests for the Shopify source. For now, the unit_tests cover the bulk flow, and no regressions are faced. I've extended the unit_tests to cover a couple of additional cases we want to be retryable and removed the patch logic.

bazarnov avatar May 03 '24 12:05 bazarnov

Update the migration-guide with the Products stream changes as well.

bazarnov avatar May 03 '24 13:05 bazarnov

Some intermediate Live Tests results:

## Tested Connection IDs:
1)  ID: `80be7c99-7a87-41e9-bae9-b0e61e7be4c4`
2)  ID: `89ed6b0d-3cdf-46e1-bbee-35061c038471`
3)  ID: `58c561e0-c3c8-48cc-94dd-0097fc8de403`

Results ID: 80be7c99-7a87-41e9-bae9-b0e61e7be4c4 GH Action Report Artifact

ID: 89ed6b0d-3cdf-46e1-bbee-35061c038471 GH Action Report Artifact

ID: 58c561e0-c3c8-48cc-94dd-0097fc8de403 GH Action Report Artifact

cc @maxi297

UPDATED: non-relevant, will make another batch of tests using 2 connection_id / affected stream.

bazarnov avatar May 08 '24 15:05 bazarnov

The pre-release version has been published successfully: 2.1.0-dev.805cd2979d Action run: https://github.com/airbytehq/airbyte/actions/runs/9014252053

bazarnov avatar May 09 '24 08:05 bazarnov

Live Tests results:

Tested Connection IDs:

1) `Products` stream:
- `1adee9d6-d8b1-4425-85a7-e13aebf7331e`
- `80be7c99-7a87-41e9-bae9-b0e61e7be4c4`

2) `Product Images` stream:
- `d82df6a0-1b94-4608-9e63-e9f0d57ec652`
- `2fa9f9ea-5a58-4628-b5f1-08314bfeacaf`

3) `Product Variants` stream:
- `81a5acbb-4374-497d-9af2-297ea8e7f6a7`
- `a7d4dcfe-39bb-4803-8e49-18ae07e838b2`

Results:

Products stream:

  1. ID: 1adee9d6-d8b1-4425-85a7-e13aebf7331e GH Action Report Artifact

  2. ID: 80be7c99-7a87-41e9-bae9-b0e61e7be4c4 GH Action Report Artifact

Product Images stream:

  1. ID: d82df6a0-1b94-4608-9e63-e9f0d57ec652 GH Action Report Artifact

  2. ID: 2fa9f9ea-5a58-4628-b5f1-08314bfeacaf GH Action Report Artifact

Product Variants stream:

  1. ID: 81a5acbb-4374-497d-9af2-297ea8e7f6a7 GH Action Report Artifact

  2. ID: a7d4dcfe-39bb-4803-8e49-18ae07e838b2 GH Action Report Artifact

cc @maxi297

bazarnov avatar May 10 '24 10:05 bazarnov

I feel uneasy to accept those live testing results without explaining those gaps:

1adee9d6-d8b1-4425-85a7-e13aebf7331e and 80be7c99-7a87-41e9-bae9-b0e61e7be4c4

The HTTP requests are still all the same i.e. I still don't see any request to products.json in the control version which makes me wonder if the stream is tested at all... Can you explain why we don't see HTTP requests to /products.json

d82df6a0-1b94-4608-9e63-e9f0d57ec652

It says product_variants stream: records with primary key in target & control whose values differ but the diff does not highlight this change. I would also expect to see this discrepancy in 81a5acbb-4374-497d-9af2-297ea8e7f6a7 and a7d4dcfe-39bb-4803-8e49-18ae07e838b2. Do we know why this change is not picked up by the live testing?

maxi297 avatar May 10 '24 13:05 maxi297

Because of the inability to test using live-testing tool (permissions issues), the manual changes report was conducted.

CHANGES (REGRESSION) TEST PLAN

  1. Make common CONFIG
  2. Make common CATALOG with isolated target (affected) stream
  3. Run-Test each affected stream against the Test Criteria. Affected streams: Products, Product Images, Product Variants

COMMANDS:

FULL-REFRESH

## CONTROL

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:latest read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --debug > control.json
## TARGET

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --debug > target.json

INCREMENTAL

## CONTROL

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:latest read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --state /integration_tests/control_state.json --debug > control_inc.json
## TARGET

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --state /integration_tests/target_state.json --debug > target_inc.json

Test Criteria:

  1. Outgoing API request example:

    • 1.1) CONTROL
    • 1.2) TARGET
    • Explain the diff
  2. Sync mode: Full-Refresh

    • 2.1) Number of records:

      • 2.1.1) CONTROL
      • 2.1.2) TARGET
      • Explain the diff
    • 2.2) STATE comparison:

      • 2.2.1) CONTROL
      • 2.2.2) TARGET
      • Explain the diff
    • 2.3) Missing records:

      • 2.3.1) CONTROL
      • 2.3.2) TARGET
      • Explain the diff
  3. Sync mode: Incremental

    • 3.1) Number of records:

      • 3.1.1) CONTROL
      • 3.1.2) TARGET
      • Explain the diff
    • 3.2) STATE comparison:

      • 3.2.1) CONTROL
      • 3.2.2) TARGET
      • Explain the diff
    • 3.3) Missing records:

      • 3.3.1) CONTROL
      • 3.3.2) TARGET
      • Explain the diff
  4. Record example:

    • 4.1) CONTROL
    • 4.2) TARGET
  5. Schema changes:

    • 5.1) Added fields
    • 5.2) Changes types
    • 5.3) Removed fields
    • Explain the diff
  6. Is this a breaking change:

    • [] Yes
    • [] No

Test Results ( PII - free ):

cc @maxi297

bazarnov avatar May 13 '24 11:05 bazarnov

@bazarnov @maxi297 quick question: since there are breaking changes, it should have been included in a 3.0 release no? I can't find how Airbyte Cloud handles connector upgrade, is this automated?

loris avatar May 31 '24 13:05 loris

Hi @loris ! The breaking changes were introduced as part of 2.1.0. As this is a breaking change, the opt-in mechanism on cloud should kick in and you should see a prompt asking you to upgrade before 2024-06-10

maxi297 avatar May 31 '24 14:05 maxi297

@maxi297 Thanks for the clarification, I'm in discussion with cloud support because this is clearly not working as expected. In our production workspace, connections have been upgraded to 2.2 without any prompt (and thus broke our pipeline), while on our other workspaces, connections are still running Shopify 2.0 connector and we don't see any Upgrade prompt popping up.

loris avatar Jun 02 '24 16:06 loris

@maxi297 Actually I might have found the issue:


      2.1.0:
        message:
          "This upgrade changes the `Products`, `Product Images` and `Product Variants` streams to use `Shopify GraphQL BULK`.
          More details here: https://github.com/airbytehq/airbyte/pull/37767."
        upgradeDeadline: "2024-06-10"
        scopedImpact:
          - scopeType: stream
            impactedScopes: ["product_variants"]

The breaking change message only states product_variants in the impactedScopes, while products was also clearly impacted. Our Airbyte connections only needed to sync products (as the variants where part of that stream too). But with the change, we lost the variant data and were not informed of the breaking change because the impactedScopes did not included products, does this seem right?

loris avatar Jun 04 '24 08:06 loris

@loris This message included the breaking change that breaks the source sync operations only, the info about the data was re-arranged based on the API changes are not included in this message.

In the migration guide here: https://github.com/airbytehq/airbyte/blob/e9e6cf3778caf804bc5d433ccda1530de1373a6d/docs/integrations/sources/shopify-migrations.md which comes with this release, we included all the details about the actual data change we discovered. Have you had a chance to observe this in the UI?

Hope that answers your question and thank you for your feedback.

bazarnov avatar Jun 04 '24 09:06 bazarnov

Very clear!

Have you had a chance to observe this in the UI?

This is the part which is confusing to me: No, I did not observe anything in the UI. I also have a workspace where Shopify connector are still running the 2.0 version and did not see any Upgrade banner. I have the email/webhook notifications set up, but nothing there too. We are waiting for a reply of Airbyte Cloud support and will let you know

loris avatar Jun 04 '24 10:06 loris

This information will be helpful for us a well as maybe they is a gap we are not aware related to the opt-in mechanism. Please, keep us up to date on this one

maxi297 avatar Jun 04 '24 12:06 maxi297