airbyte
airbyte copied to clipboard
🐛 Source Shopify: Fixed the issue with `state` causing the `substreams` to skip the records, added `GraphQL BULK Operations`
What
Resolving: https://github.com/airbytehq/oncall/issues/3523
How
- freeze the STATE for
Substreamsto avoid STATE regression while reading the stream - updated the
filter_records_newer_than_statemethod to use thecached_substream_statecaptures when the sync starts - updated
configException to use theAibyteTracebackExceptionfor config errors - Added
GraphQL BULK Operationsto speed up the streams like:metafield_collectionsmetafield_customersmetafield_draft_ordersmetafield_locationsmetafield_ordersmetafield_product_imagesmetafield_product_variantscollectionsdiscount_codesfulfillment_ordersinventory_itemsinventory_levels- added new
transactions_graphql(duplicatedtransactionsstream with reduced schema)
- re-organized the code structure ( a.k.a to make it easy to find something in this codebase) :
- moved all
streamimplementations fromsource.py>streams.streams.py - moved all
base-classimplementations fromsource.py>streams.base_streams.py
- moved all
- added new
input configurationoptional fieldGraphQL BULK Date Range in Daysto manage BULK slices date range on demand. - refactored the next streams to fetch data from their direct
parentand not fetch the data again usingrequests(another performance boost):- added
IncrementalShopifyNestedSubstreamand inherit from it:Product ImagesProduct VariantsOrder RefundsFulfillmentsCustomer Address
- added
- added
concurrency checksto hold on to thejob creatingifanother jobis already running. - adjusted existing
unit_teststo follow the current updates - covered new implementations with
unit_tests
🔴 User Impact 🔴
This PR introduces the breaking change for the :
Fulfillmentsstream, by changing thecursorfromidtoupdated_at- the stream requiresresetOrder Refundsstream, by changing theschema.refund_line_items.line_item.propertiesto array ofstrings, instead ofobjectwith properties.Collectionsstream, a new scope is required to be added for all connections that useAPI_PASSWORDauth option:read_publications- For all the OAuth2.0 users, the https://github.com/airbytehq/airbyte-platform-internal/pull/10440 adds the support of
read_publicationsand users should justre-authto pick up the new scope.
Other changes are not considered as the breaking changes, but optimization and performance improvements.
P.S.: The community post about migrating OrderRisks > GQL: https://community.shopify.com/c/graphql-basics-and/get-the-orderrisks-full-data-using-graphql-not-rest-api/m-p/2353776/thread-id/12174
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| airbyte-docs | ⬜️ Ignored (Inspect) | Visit Preview | Feb 26, 2024 1:33pm |
Before Merging a Connector Pull Request
Wow! What a great pull request you have here! 🎉
To merge this PR, ensure the following has been done/considered for each connector added or updated:
- [x] PR name follows PR naming conventions
- [x] Breaking changes are considered. If a Breaking Change is being introduced, ensure an Airbyte engineer has created a Breaking Change Plan.
- [x] Connector version has been incremented in the Dockerfile and metadata.yaml according to our Semantic Versioning for Connectors guidelines
- [x] You've updated the connector's
metadata.yamlfile any other relevant changes, including abreakingChangesentry for major version bumps. See metadata.yaml docs - [x] Secrets in the connector's spec are annotated with
airbyte_secret - [x] All documentation files are up to date. (README.md, bootstrap.md, docs.md, etc...)
- [x] Changelog updated in
docs/integrations/<source or destination>/<name>.mdwith an entry for the new version. See changelog example - [x] Migration guide updated in
docs/integrations/<source or destination>/<name>-migrations.mdwith an entry for the new version, if the version is a breaking change. See migration guide example - [x] If set, you've ensured the icon is present in the
platform-internalrepo. (Docs)
If the checklist is complete, but the CI check is failing,
-
Check for hidden checklists in your PR description
-
Toggle the github label
checklist-action-runon/off to re-run the checklist CI.
The pre-release version of the source-shopify has been published successfully:
- tag: 1.1.5-dev.255fd49478
- workflow: https://github.com/airbytehq/airbyte/actions/runs/6815263956
cc @girarda
The new pre-release version of: 1.1.5-dev.28dfd3e476 for the source-shopify was published.
@girarda FYI.
@girarda absolutely. Thanks for the valuable comments and your review.
The new pre-release version: 1.2.0-dev.285e46aa02 for the source-shopify was published.
It includes performance improvements for the next streams:
- Product Images
- Product Variants
- Order Refunds
- Fulfillments
- Customer Address
Please consider the Fulfillments stream will require Schema Refresh and Reset ideally, but Schema Refresh would be sufficient to go with, no need to do Reset for this stream, as we only expect the cusor_field to be changed, thus, it will be automatically updated inside the STATE and will go with it, by adding the updated_at cursor, leaving id in place (this is handy for tests while switching back to the previous pre-release once needed).
I'll consider some backward compatibility for the old cursor (id) for the final release.
@girarda FYI.
The new pre-release version: 1.2.0-dev.34b1470aa4 for the source-shopify was published.
It includes fix for the next streams:
- Order Refunds schema regression.
Refreshing and resetting the schema for this stream is required.
@girarda FYI.
The new pre-release version: 1.2.0-dev.f73cf9c127 for the source-shopify was published.
It includes migrated:
collections(action is required*)discount_codesfulfillment_ordersinventory_itemsinventory_levels- new
transactions_graphql(duplicatedtransactionsstream with reduced schema) see this thread for more info.
🔴 IMPORTANT 🔴
*For the Collections stream, a new scope is required to be added for all connections that use API_PASSWORD auth option:
read_publications
Why?
Because the Shopify GQL uses a more granular approach to fetch the data than the REST one, it's required to use combined scopes:
read_collections,read_publications. So far, this is the only stream that utilizes multiple scopes.
For all the OAuth2.0 users, this PR adds the support of read_publications and users should just re-auth to pick up the new scope.
@girarda FYI.
[!WARNING] 🚨 Connector code freeze is in effect until 2024-01-02. This PR is changing connector code. Please contact the current OC engineers if you want to merge this change to master.
Regards this: https://github.com/airbytehq/airbyte/pull/32345#discussion_r1446639490
The query.py module has 99% unit test cov. What other tests you would like to see?
Please run the coverage run -m pytest && coverage html --omit="unit_tests/*" to check the coverage.
The Breaking Changes doc, lives here: https://docs.google.com/document/d/1riyv36EEkkP-T1LK2CHeKyfR_uKCjI5sm1sb4CqqYu0/edit
@girarda @katmarkham @lazebnyi FYI.
it's not so much about increasing the coverage, but documenting how each pieces (classes) are meant to be used. methods in this file all have clear inputs and outputs that can be easily documented through unit tests.
All the methods of tools.py are directly tested here
@girarda I've transferred your changes from https://github.com/airbytehq/airbyte/pull/34546 to this branch. You can close yours now.
@girarda @katmarkham @oustynova @lazebnyi
Additional QA test for 2.0.0 compare to the 1.1.7 (latest) is conducted here: https://docs.google.com/document/d/1aZbeyw_DCwnsoOlV-5KL8yLauA8I_FQMV7uI0jhP4o0/edit#heading=h.onifjcf8uiqw
PR looks good to me ✅ .
This is a very large change so we'll stagger the release. I'll approve once we're comfortable releasing into the wild
Hey @bazarnov - any chance you can get this deployed today? This would be a LIFE saver to me. Literally save me 2 hours a day of work
Hey @bazarnov - any chance you can get this deployed today? This would be a LIFE saver to me. Literally save me 2 hours a day of work
I wish I could say so, unfortunately. We start shipping the update on Monday next week.
The new pre-release build has been created:
2.0.0-dev.abdc590830
Pre-release Notes:
- The small regression fix for the
Customer Addressstream, when thedefaultAddressproperty isNone(literal), instead ofobjectas expected from the API docs.
I'll merge this on Monday, February 26th, at 8:00 AM (PST)