discussion: token refresh mechanism for rest client
Background: #301
The token fetched from the token server may have a TTL, see TokenResponse::expires_in. In most implementations, the value is about several hours. Our catalog client is a long-lived object, which means that we should handle the token expiration event.
There are two ways:
- Spawn a background task, and setup a ticker with a specified time interval, e.g.
expires_in * 0.9seconds, and refetch the token when triggered.
- Pros: Easy to implement
- Cons: Must trust the local timer skew
- Cons: Must introduce a timer, which means depending on a specified async runtime
- Call every methods with a retry wrapper. When meeting an unauthorized error, refetch the token and retry the method.
- Pros: Consistent with iceberg-python
- Pros: Does not rely on local clock and specific runtime
- Cons: When expired, thousands of concurrent requests may fail, and then all of them will trigger a token refetch, which is not ideal.
- This can be workaround by some concurrency control, to force only one request will refetch the token and others must wait for the result, but it introduced complexity.
2. Call every methods with a retry wrapper. When meeting an unauthorized error, refetch the token and retry the method.
Our rest catalog client now has a authenticate function.
https://github.com/apache/iceberg-rust/blob/0937f19d4c4b4ea14d423e6c78731904b4c9012e/crates/catalog/rest/src/client.rs#L88
We can perform the expiration logic here.
I also prefer 2. The iceberg community may deprecate current auth approach in future, and introduce true oauth client in future. I prefer to have a simpler solution without introducing too much extra dependency.
- Store
(expired_at, token)pair in an asyncMutex, and check whether it's close to expiration when needed. If it's close to expiration, lock the mutex, update the pair, and unlock. Then only one request will update the token.
3. Store
(expired_at, token)pair in an asyncMutex, and check whether it's close to expiration when needed. If it's close to expiration, lock the mutex, update the pair, and unlock. Then only one request will update the token.
Great!
- Store
(expired_at, token)pair in an asyncMutex, and check whether it's close to expiration when needed. If it's close to expiration, lock the mutex, update the pair, and unlock. Then only one request will update the token.
Sounds reasonable to me.
Then we need https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html, still depends on tokio :(
This is rust
Then we need https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html, still depends on tokio :(
This is rust
How about futures Mutex?
The current implementation utilizes std::sync::Mutex with a precisely controlled locking scope.
https://github.com/apache/iceberg-rust/blob/85627083f0791629be3d73413af936165ebd38eb/crates/catalog/rest/src/client.rs#L88-L179
The worst thing that could happen here is that we might send multiple token refresh requests, which I think is fine for a catalog.
It's acceptable, but not as expected. When a client request a token, the server may expect it will be used for several hours.
as an alternative, can we simply expose a public refresh_token method that allows the caller to implement their own token invalidation (for my use case, via option 1, using tokio)
this avoids the downside of "thousands of concurrent requests may fail", and does not require the core library to take any additional dependencies
this change perhaps does not remove the need to implement option 2 in this library, but it does provide an option for downstream consumers to implement token expiry as they wish (where there currently is no such option)
@liurenjie1024 I've implemented this here: https://github.com/apache/iceberg-rust/pull/1465