airbyte
airbyte copied to clipboard
Support single use refresh tokens (connectors can update their configs during a sync)
Tell us about the problem you're trying to solve
Today we do not handle the case where a new refresh token is returned when a new access token is granted.
This is because there doesn't exist a Protocol message that updates configs. This will probably involve adding a message similar to STATE but updates the config.
Describe the solution you’d like
We should provide a way to save a returned refresh token.
To add to the context,
I'm currently working on a source connector that's implementing "short-lived access tokens and long-lived refresh tokens".
Based on the documentation of the source I'm working on, they return a new refresh_token after the renewal (of the access token) and they require the new refresh_token to be used in the succeeding token refresh calls.
From their docs:
Each time you perform a token refresh, you should save the new refresh token returned in the response
From this specification, I need to have the most updated refresh_token for the succeeding token refresh calls to be valid.
@sherifnada @davinchia
Currently, @santels is blocked on contributing their Xero connector because of this issue. How big of a change would this be to implement?
To me, the task that takes more time here is extending the protocol. We have to make sure we do it correctly. Making the CDK modification should be a matter of prioritisation after. Do you agree @sherifnada ?
@avaidyanatha is there are preferred deadline for this or is it asap?
I want to write two source connectors which both require the use of Authorization grant followed by rotating refresh tokens.
In the meantime I am thinking of using external storage (like azure blob or ftp) to store this azure token outside of the connector until this feature is complete. Or is there another way to more elegantly solve this?
@wissevrowl until we fix this issue that's probably your best bet
I have this problem with the Intuit Quickbooks source. The old refresh token is expired when a new one is granted (or used, possibly). The singer tap itself does handle it in a fashion – it overwrites the config.json file with the new token. However this doesn't work with the check command.
Is the protocol under discussion? It seems something simple like this would be sufficient?
{"type": "CONFIG", "config": {"refresh_token": "...", "access_token": "..."}}
Has anyone come up with a way around this for now? @akvadrako ?
I spend a day on this and gave up. The singer tap could handle it, but the airbyte adapter would delete the updated config file right away, throwing away the refresh token. I tried to patch the tap and the connector, but there were plenty of issues with incompatible versions since none of the dependencies are versioned.
@akvadrako so the solution was just to do this outside of Airbyte?
Yes, in the end we didn't use the airbyte connector.
@sherifnada can you provide an ETA for this to become available? This is a serious blocker for a while now.
Any update on this? How can we help?
I happen to know that Hubspot has rather short-lived oAuth refresh tokens. Is what we do within that connector generalizable?
@evantahler I would love to know more about this!
I think I initially missed the nuance even the refresh token was to expire eventually. That means that this is likely not like source-hubspot. I don't believe we need a new message in the Airbyte Protocol for this, and we can re-use our existing config messaging (ConfiguredAirbyteCatalog).
- The platform change would be that these messages, should they appear within a sync from a source, are persisted.
- The CDK change would be to re-emit the
ConfiguredAirbyteCatalogif come configuration changes.
Perhaps the workflow looks like this:
- When we do the oAuth request the first time, we store the refresh token and not the auth_token in the config (happens today)
- The connector knows we've stored the refresh token, so when we start the READ and every time we emit a state message (?), we us the refresh token to get a new auth token and a new refresh_token. The connector keeps this in memory and uses the new token for the rest of the sync. (happens today)
- We need to store the updated tokens: A. At the end of the sync, the connector could re-emits the configured catalog with any changes, and the platform stores that for next time B. Every time there's an update to the config, the connector could re-emits the configured catalog with any changes, and the platform stores that for next time
Some outstanding questions:
- Do we pick 3A (only update the config at the end of the sync) or 3B (allow regular updates of the config)? Is there value to ensuring the sync completed successfully before persisting the new data?
- Is it possible that we need to ensure that syncs run /at least/ every X hours so the tokens have a chance to refresh? Is there metadata that the connector needs to share this with the platform?
We need to store the updated tokens: A. At the end of the sync, the connector could re-emits the configured catalog with any changes, and the platform stores that for next time B. Every time there's an update to the config, the connector could re-emits the configured catalog with any changes, and the platform stores that for next time
I might be mis-remembering so take with a grain of salt.
I don't think we have a message type with 'update configuration' semantics. We would need one to fulfil step 3A. The Configured Airbyte Catalog is only used as input today, so we would need to at minimum, add it as a message type and change it's semantics within the platform i.e. if the platform sees this from the source, it should save it. Let me know if I'm missing something!
2c: keep in mind this would also need to work with secret obfuscation mechanics
Grooming Notes:
- If the same source is used in multiple connections, we need to be aware of race conditions - no. That's up to the source, and one revocation may stop all parallel syncs anyway. The message to users is to make multiple accounts/multiple source. The latest write will be right anyway.
- In the generic sense, what we are doing is coming up with a generic method to allow a source to update it's config mid-sync
- Can we use existing messages in the protocol, or is this a new protocol message? Is this only allowed at the end of a sync?
Goals:
- Write a spec / PRD for this change (there are new protocol messages? New parts of a sync? New data storage?)
@evantahler thank you for this writeup. What's the plan of attack going forward? Should the community or Airbyte own this implementation? Seems like a big change with a lot of understanding of the platform, so just want to understand how we go about this. It also seems like a lot of decent size of big name connectors are running into this issue.
@RickRen7575 this isn't yet prioritized... we will comment here when it is!
Just checking in here!
We are going to start looking into this in Q4! This is "Support single use refresh tokens" in our roadmap.
~~I talked to @davinchia and @pmossman to add context on customers effected by this story~~ TCS believes this story will still be done this quarter. turns out @evantahler 's team is on this one.. but we still are expecting this this quarter to the best of my knowledge
That is so great to hear!
On Wed, Nov 9, 2022 at 2:44 PM Topher Lubaway @.***> wrote:
I talked to @davinchia https://github.com/davinchia and @pmossman https://github.com/pmossman to add context on customers effected by this story TCS believes this story will still be done this quarter.
— Reply to this email directly, view it on GitHub https://github.com/airbytehq/airbyte/issues/3990#issuecomment-1309270981, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCP57ST2QFIM7DWLTGZHTLWHP5KVANCNFSM46LXUMKA . You are receiving this because you were mentioned.Message ID: @.***>
-- [image: photo] Ricky Renner Data Engineer, AMEND Consulting LLC
(216) 906-5178
www.linkedin.com/in/ricky-renner
Noting that this work will help troubleshoot:
- Bing Ads: https://github.com/airbytehq/oncall/issues/999
- Google Sheets
- Salesforce (maybe)
So not Quickbooks Online?
On Wed, Nov 9, 2022 at 5:52 PM Evan Tahler @.***> wrote:
Noting that this work will help troubleshoot:
- Bing Ads: airbytehq/oncall#999 https://github.com/airbytehq/oncall/issues/999
- Google Sheets
- Salesforce (maybe)
— Reply to this email directly, view it on GitHub https://github.com/airbytehq/airbyte/issues/3990#issuecomment-1309496376, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCP57QH3B5YIEMOM5PEZODWHQTKVANCNFSM46LXUMKA . You are receiving this because you were mentioned.Message ID: @.***>
-- [image: photo] Ricky Renner Data Engineer, AMEND Consulting LLC
(216) 906-5178
www.linkedin.com/in/ricky-renner
Noting that this work will help troubleshoot:
Those are additional issues we see that solving this will /also/ help!
#8019 Source Typeform: support oauth: ISSIE: https://github.com/airbytehq/airbyte/issues/7814 PR: https://github.com/airbytehq/airbyte/pull/8019 blocked since the connector uses single-use refresh token, which requires proper processing from platform side
@pedroslopez I think that it is time to close this epic! The platform and CDK now fully support AirbyteControlMessages to allow connectors to change their configuration. Future work will be done in individual connectors.
@evantahler is that true as well for the low-code CDK? What authentication do I need to use there to support changing refresh tokens?