Implement an OCC loop to update IAM policy bindings
When working with IAM policies, applications typically want to do something like "add this member to that role". Because the IAM only support "change all the policies", applications need make multiple RPCs, something like:
- Call
get_iam_policy(...)to get the current IAM policy - Update the local copy of the policy
- Apply the change with
set_iam_policy(...).
The IAM policy may change between steps 1 and 3, without some pre-condition that sequence of steps may overwrite the other changes. So the code becomes:
- Call
get_iam_policy(..)to get the current IAM policy - Update the local copy of the policy.
- Apply the change with
set_iam_policy(...)and with theetagobtained in (1) to avoid overwrites - If it succeeds: 🎉 we are done.
- If it fails with
ABORTEDthat indicates a separate change, go back to step (1) - If it fails with a different error, return this error.
I would like to work on this.
FYI: @nvnmandadhi
This will involve changes to the generator, which is in the librarian repo: https://github.com/googleapis/librarian/
You will find this guide useful: https://github.com/googleapis/google-cloud-rust/blob/main/doc/contributor/howto-guide-generated-code-maintenance.md#making-changes-to-sidekick
This is what the OCC loop looks like in C++: https://github.com/googleapis/google-cloud-cpp/blob/c06d686e32bd0fba5ab36f8fdedd729402428b94/google/cloud/spanner/admin/instance_admin_client.cc#L373-L406
I recommend manually implementing the OCC polling loop in one of the generated clients, with integration tests. When you have that, I think you should share the design here.
Then we can think about how to generalize the loop in the google-cloud-gax layer. And after that, we can think about how to generate the code.
Due to the code changes required to other codebases, not going to work on this.
OCC Loop Implementation
@dbolduc @coryan
I have completed the manual implementation of the OCC loop for IAM policy updates in the Storage client.
Complete OCC loop with etag-based conflict detection
Full Design Document
OCC Loop Implementation for IAM Policy Bindings
This implementation provides a manual OCC loop for safe IAM policy updates in the Storage client, following the approach outlined in the issue:
- Prevents race conditions when multiple processes update IAM policies concurrently
- Uses etag-based conflict detection with automatic retry
- Follows the C++ reference implementation pattern
- Ready for feedback before moving to google-cloud-gax layer
- (Complete): Manual implementation in Storage client
- (Next): Generalize in google-cloud-gax layer
- (Future): Auto-generate via librarian
The implementation follows the OCC pattern with etag-based concurrency control:
1. Get current policy with etag (get_iam_policy)
2. Apply user modifications via callback
3. Set updated policy with etag (set_iam_policy)
4. If ABORTED -> retry with exponential backoff
5. If other error -> immediate failure
6. If success -> done
Type Definitions
/// Callback function for updating an IAM policy.
///
/// Returns:
/// - `Ok(Some(Policy))` - updated policy to apply
/// - `Ok(None)` - cancel operation (not an error)
/// - `Err(Error)` - error during update
pub type IamUpdater<'a> = Box<dyn FnMut(Policy) -> Result<Option<Policy>> + Send + 'a>;
Design decisions:
-
FnMutfor stateful updates (captures can be mutated) -
Sendfor multi-threaded safety -
Option<Policy>return for cancellation support - Sync (not async)
OccConfig
#[derive(Clone, Debug)]
pub struct OccConfig {
/// Maximum number of retry attempts (default: 10)
pub max_attempts: u32,
/// Maximum total time for retry loop (default: 30s)
pub max_duration: Duration,
/// Backoff policy for delays between retries
/// (default: exponential with 100ms initial, 32s max)
pub backoff_policy: Arc<dyn BackoffPolicy>,
}
- Sensible defaults aligned with other GCP libraries
- Configurable for different use cases
- Uses existing
gax::backoff_policy::BackoffPolicy
Public Functions
- Core OCC Function
pub async fn update_iam_policy_with_occ(
client: &StorageControl,
resource: impl Into<String>,
updater: IamUpdater<'_>,
config: OccConfig,
) -> Result<Policy>
- Convenience Methods on StorageControl
impl StorageControl {
/// Simple API with default configuration
pub async fn update_iam_policy_with_retry<F>(
&self,
resource: impl Into<String>,
updater: F,
) -> Result<Policy>
where
F: FnMut(Policy) -> Result<Option<Policy>> + Send + 'static;
/// Advanced API with custom configuration
pub async fn update_iam_policy_with_retry_config<F>(
&self,
resource: impl Into<String>,
updater: F,
config: OccConfig,
) -> Result<Policy>
where
F: FnMut(Policy) -> Result<Option<Policy>> + Send + 'static;
}
Examples
Add IAM Member
use google_cloud_storage::client::StorageControl;
use google_cloud_iam_v1::model::Binding;
let client = StorageControl::builder().build().await?;
let policy = client
.update_iam_policy_with_retry(
"projects/_/buckets/my-bucket",
|mut policy| {
policy.bindings.push(
Binding::new()
.set_role("roles/storage.admin")
.set_members(["user:[email protected]"])
);
Ok(Some(policy))
},
)
.await?;
Conditional Update
let policy = client
.update_iam_policy_with_retry(
"projects/_/buckets/my-bucket",
|mut policy| {
let member = "user:[email protected]";
let role = "roles/storage.viewer";
let exists = policy.bindings.iter().any(|b| {
b.role == role && b.members.contains(&member.to_string())
});
if !exists {
policy.bindings.push(
Binding::new()
.set_role(role)
.set_members([member])
);
Ok(Some(policy))
} else {
Ok(None)
}
},
)
.await?;
Custom Configuration
use google_cloud_storage::iam_occ::OccConfig;
use std::time::Duration;
let config = OccConfig {
max_attempts: 5,
max_duration: Duration::from_secs(10),
..Default::default()
};
let policy = client
.update_iam_policy_with_retry_config(
"projects/_/buckets/my-bucket",
|mut policy| {
// Update logic
Ok(Some(policy))
},
config,
)
.await?;
Test Infrastructure
Implemented trait-based mocking for comprehensive testing:
/// Trait for IAM policy operations
#[cfg_attr(test, mockall::automock)]
#[async_trait::async_trait]
pub(crate) trait IamPolicyOperations {
async fn get_iam_policy(&self, resource: &str) -> Result<Policy>;
async fn set_iam_policy(&self, resource: &str, policy: Policy) -> Result<Policy>;
}
// Generic implementation for testability
async fn update_iam_policy_with_occ_impl<C: IamPolicyOperations>(
client: &C,
resource: impl Into<String>,
updater: IamUpdater<'_>,
config: OccConfig,
) -> Result<Policy>
// Public API wraps generic implementation (zero breaking changes)
pub async fn update_iam_policy_with_occ(
client: &StorageControl,
resource: impl Into<String>,
updater: IamUpdater<'_>,
config: OccConfig,
) -> Result<Policy> {
update_iam_policy_with_occ_impl(client, resource, updater, config).await
}
Design Decisions & Rationale
1. Sync Updater (not async)
Decision: FnMut(Policy) -> Result<Option<Policy>>
Rationale:
- Simpler for users (no async closure complexity)
- Policy modifications are typically CPU-bound (no I/O)
- Matches C++ reference implementation pattern
- Can always add async variant later if needed
2. Option<Policy> Return
Decision: Return Option<Policy> from updater
Rationale:
-
Some(policy)= apply changes -
None= cancel operation without error - Enables conditional updates ("only if doesn't exist")
3. update_iam_policy_with_retry
Decision: Use "retry" in public method names
Rationale:
- More intuitive for users (describes behavior, not implementation)
- Matches naming conventions in other GCP SDKs
- "OCC" term reserved for module/implementation details
- Aligns with user mental model
4. Trait-Based Mocking
Decision: Create IamPolicyOperations trait with generic implementation
Rationale:
- Enables comprehensive unit testing without GCP
- Preview of design (will move to gax)
- Zero breaking changes (public API unchanged)
- Follows project patterns (mockall)
5. Exponential Backoff
Decision: Use exponential backoff (100ms initial, 32s max)
Rationale:
- Standard pattern for distributed systems
- Reduces server load during contention
- Matches C++ implementation behavior
- Configurable via
OccConfig
Preview
The current implementation previews design:
// Internal trait (pub(crate))
trait IamPolicyOperations {
async fn get_iam_policy(&self, resource: &str) -> Result<Policy>;
async fn set_iam_policy(&self, resource: &str, policy: Policy) -> Result<Policy>;
}
// Storage-specific
impl IamPolicyOperations for StorageControl { ... }
Future
// Public trait in gax layer
pub trait IamOperations {
type Policy: Clone + Send;
async fn get_iam_policy(&self, resource: &str) -> Result<Self::Policy>;
async fn set_iam_policy(&self, resource: &str, policy: Self::Policy) -> Result<Self::Policy>;
}
// Extension trait for convenience
pub trait IamOperationsExt: IamOperations {
async fn update_iam_policy_with_retry<F>(...) -> Result<Self::Policy>;
}
// Auto-implemented for all IAM services
impl<T: IamOperations> IamOperationsExt for T {}
This approach enables:
- Reuse across all IAM-capable services
- Consistent API patterns
- Easy testing with mocks
Questions for Maintainers
1. API Design
- Is
update_iam_policy_with_retryan appropriate name? - Should we expose
OccConfigin public API or use builder pattern? - Is sync updater acceptable, or do we need async variant?
2. Testing
- Is trait-based mocking approach acceptable?
- Should we add real GCP integration tests now or defer?
- Is 100% scenario coverage sufficient?
3. Planning
- Should trait be
IamOperationsorIamPolicyOperations? - Should extension trait auto-implement or be explicit?
- Any concerns about generic implementation approach?
4. Documentation
- Is rustdoc sufficient or need additional guides?
- Should we add migration guide from manual get/set pattern?
- Any specific examples you'd like to see?
Requesting feedback on:
- API design and naming choices
- Testing approach (trait-based mocking)
- Move to gax layer
Thanks for the detailed design. Unfortunately we are a bit busy at the moment, and really cannot provide equally detailed feedback. I hope you did not waste too much time working on this feature.
I have a vision of what it should look like, if it fits with your vision, then i would like to write code )