airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

🐛 Source Shopify: Fixed the issue with `state` causing the `substreams` to skip the records, added `GraphQL BULK Operations`

Open bazarnov opened this issue 2 years ago • 9 comments

What

Resolving: https://github.com/airbytehq/oncall/issues/3523

How

  • freeze the STATE for Substreams to avoid STATE regression while reading the stream
  • updated the filter_records_newer_than_state method to use the cached_substream_state captures when the sync starts
  • updated config Exception to use the AibyteTracebackException for config errors
  • Added GraphQL BULK Operations to speed up the streams like:
    • metafield_collections
    • metafield_customers
    • metafield_draft_orders
    • metafield_locations
    • metafield_orders
    • metafield_product_images
    • metafield_product_variants
    • collections
    • discount_codes
    • fulfillment_orders
    • inventory_items
    • inventory_levels
    • added new transactions_graphql (duplicated transactions stream with reduced schema)
  • re-organized the code structure ( a.k.a to make it easy to find something in this codebase) :
    • moved all stream implementations from source.py > streams.streams.py
    • moved all base-class implementations from source.py > streams.base_streams.py
  • added new input configuration optional field GraphQL BULK Date Range in Days to manage BULK slices date range on demand.
  • refactored the next streams to fetch data from their direct parent and not fetch the data again using requests (another performance boost):
    • added IncrementalShopifyNestedSubstream and inherit from it:
      • Product Images
      • Product Variants
      • Order Refunds
      • Fulfillments
      • Customer Address
  • added concurrency checks to hold on to the job creating if another job is already running.
  • adjusted existing unit_tests to follow the current updates
  • covered new implementations with unit_tests

🔴 User Impact 🔴

This PR introduces the breaking change for the :

  • Fulfillments stream, by changing the cursor from id to updated_at - the stream requires reset
  • Order Refunds stream, by changing the schema.refund_line_items.line_item.properties to array of strings, instead of object with properties.
  • Collections stream, a new scope is required to be added for all connections that use API_PASSWORD auth option:
    • read_publications
    • For all the OAuth2.0 users, the https://github.com/airbytehq/airbyte-platform-internal/pull/10440 adds the support of read_publications and users should just re-auth to pick up the new scope.

Other changes are not considered as the breaking changes, but optimization and performance improvements.

P.S.: The community post about migrating OrderRisks > GQL: https://community.shopify.com/c/graphql-basics-and/get-the-orderrisks-full-data-using-graphql-not-rest-api/m-p/2353776/thread-id/12174

bazarnov avatar Nov 09 '23 12:11 bazarnov

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Feb 26, 2024 1:33pm

vercel[bot] avatar Nov 09 '23 12:11 vercel[bot]

Before Merging a Connector Pull Request

Wow! What a great pull request you have here! 🎉

To merge this PR, ensure the following has been done/considered for each connector added or updated:

  • [x] PR name follows PR naming conventions
  • [x] Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
  • [x] Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
  • [x] You've updated the connector's metadata.yaml file any other relevant changes, including a breakingChanges entry for major version bumps. See metadata.yaml docs
  • [x] Secrets in the connector's spec are annotated with airbyte_secret
  • [x] All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
  • [x] Changelog updated in docs/integrations/<source or destination>/<name>.md with an entry for the new version. See changelog example
  • [x] Migration guide updated in docs/integrations/<source or destination>/<name>-migrations.md with an entry for the new version, if the version is a breaking change. See migration guide example
  • [x] If set, you've ensured the icon is present in the platform-internal repo. (Docs)

If the checklist is complete, but the CI check is failing,

  1. Check for hidden checklists in your PR description

  2. Toggle the github label checklist-action-run on/off to re-run the checklist CI.

github-actions[bot] avatar Nov 09 '23 12:11 github-actions[bot]

The pre-release version of the source-shopify has been published successfully:

  • tag: 1.1.5-dev.255fd49478
  • workflow: https://github.com/airbytehq/airbyte/actions/runs/6815263956

cc @girarda

bazarnov avatar Nov 09 '23 17:11 bazarnov

The new pre-release version of: 1.1.5-dev.28dfd3e476 for the source-shopify was published. @girarda FYI.

bazarnov avatar Nov 28 '23 21:11 bazarnov

@girarda absolutely. Thanks for the valuable comments and your review.

bazarnov avatar Dec 02 '23 10:12 bazarnov

The new pre-release version: 1.2.0-dev.285e46aa02 for the source-shopify was published. It includes performance improvements for the next streams:

  • Product Images
  • Product Variants
  • Order Refunds
  • Fulfillments
  • Customer Address

Please consider the Fulfillments stream will require Schema Refresh and Reset ideally, but Schema Refresh would be sufficient to go with, no need to do Reset for this stream, as we only expect the cusor_field to be changed, thus, it will be automatically updated inside the STATE and will go with it, by adding the updated_at cursor, leaving id in place (this is handy for tests while switching back to the previous pre-release once needed).

I'll consider some backward compatibility for the old cursor (id) for the final release.

@girarda FYI.

bazarnov avatar Dec 05 '23 00:12 bazarnov

The new pre-release version: 1.2.0-dev.34b1470aa4 for the source-shopify was published. It includes fix for the next streams:

  • Order Refunds schema regression.

Refreshing and resetting the schema for this stream is required.

@girarda FYI.

bazarnov avatar Dec 05 '23 10:12 bazarnov

The new pre-release version: 1.2.0-dev.f73cf9c127 for the source-shopify was published. It includes migrated:

  • collections (action is required*)
  • discount_codes
  • fulfillment_orders
  • inventory_items
  • inventory_levels
  • new transactions_graphql (duplicated transactions stream with reduced schema) see this thread for more info.

🔴 IMPORTANT 🔴 *For the Collections stream, a new scope is required to be added for all connections that use API_PASSWORD auth option:

  • read_publications

Why? Because the Shopify GQL uses a more granular approach to fetch the data than the REST one, it's required to use combined scopes:

  • read_collections, read_publications. So far, this is the only stream that utilizes multiple scopes.

For all the OAuth2.0 users, this PR adds the support of read_publications and users should just re-auth to pick up the new scope.

@girarda FYI.

bazarnov avatar Dec 18 '23 15:12 bazarnov

[!WARNING] 🚨 Connector code freeze is in effect until 2024-01-02. This PR is changing connector code. Please contact the current OC engineers if you want to merge this change to master.

github-actions[bot] avatar Dec 21 '23 10:12 github-actions[bot]

Regards this: https://github.com/airbytehq/airbyte/pull/32345#discussion_r1446639490 The query.py module has 99% unit test cov. What other tests you would like to see?

Please run the coverage run -m pytest && coverage html --omit="unit_tests/*" to check the coverage.

bazarnov avatar Jan 10 '24 18:01 bazarnov

The Breaking Changes doc, lives here: https://docs.google.com/document/d/1riyv36EEkkP-T1LK2CHeKyfR_uKCjI5sm1sb4CqqYu0/edit

@girarda @katmarkham @lazebnyi FYI.

bazarnov avatar Jan 25 '24 20:01 bazarnov

it's not so much about increasing the coverage, but documenting how each pieces (classes) are meant to be used. methods in this file all have clear inputs and outputs that can be easily documented through unit tests.

All the methods of tools.py are directly tested here

bazarnov avatar Jan 29 '24 09:01 bazarnov

@girarda I've transferred your changes from https://github.com/airbytehq/airbyte/pull/34546 to this branch. You can close yours now.

bazarnov avatar Jan 29 '24 10:01 bazarnov

@girarda @katmarkham @oustynova @lazebnyi Additional QA test for 2.0.0 compare to the 1.1.7 (latest) is conducted here: https://docs.google.com/document/d/1aZbeyw_DCwnsoOlV-5KL8yLauA8I_FQMV7uI0jhP4o0/edit#heading=h.onifjcf8uiqw

bazarnov avatar Jan 31 '24 12:01 bazarnov

PR looks good to me ✅ .

This is a very large change so we'll stagger the release. I'll approve once we're comfortable releasing into the wild

girarda avatar Feb 01 '24 01:02 girarda

Hey @bazarnov - any chance you can get this deployed today? This would be a LIFE saver to me. Literally save me 2 hours a day of work

sockpapi avatar Feb 08 '24 16:02 sockpapi

Hey @bazarnov - any chance you can get this deployed today? This would be a LIFE saver to me. Literally save me 2 hours a day of work

I wish I could say so, unfortunately. We start shipping the update on Monday next week.

bazarnov avatar Feb 08 '24 21:02 bazarnov

The new pre-release build has been created:

2.0.0-dev.abdc590830

Pre-release Notes:

  • The small regression fix for the Customer Address stream, when the defaultAddress property is None (literal), instead of object as expected from the API docs.

bazarnov avatar Feb 21 '24 21:02 bazarnov

I'll merge this on Monday, February 26th, at 8:00 AM (PST)

bazarnov avatar Feb 23 '24 19:02 bazarnov