:bug: Revoking credentials with large rev-reg sizes is inefficient and doesn't scale
When revoking many credentials (batching them and publishing once), we can see significant inefficiencies in the AnonCreds revocation manager.
Essentially, there is a "quadratic inefficiency" (scales by order N²) baked into the revocation process.
Problem summary:
- We have a revocable cred def
- 100 credentials have already been issued
- we now want to revoke 10 of them
- all 100 records are fetched, in order to find the first cred to revoke
- then, all 100 records are fetched again, to find the next cred to revoke. And repeat.
- result: 1000 storage records get scanned + deserialized to revoke 10 credentials.
Problem code (the debug logs are my own addition, with # <--- comments added to highlight stuff):
# revocation_anoncreds.manager.RevocationManager
async def set_cred_revoked_state(
self, rev_reg_id: str, cred_rev_ids: Sequence[int]
) -> None:
"""Update credentials state to credential_revoked.
Args:
rev_reg_id: revocation registry ID
cred_rev_ids: list of credential revocation IDs
Returns:
None
"""
self._logger.debug(
"Setting credential revoked state for rev_reg_id=%s with %d credential IDs",
rev_reg_id,
len(cred_rev_ids),
)
for cred_rev_id in cred_rev_ids: # <--- For each credential we want to revoke
self._logger.debug(
"Processing credential revocation ID %s for registry %s",
cred_rev_id,
rev_reg_id,
)
cred_ex_id = None
try:
async with self._profile.transaction() as txn:
rev_rec = await IssuerCredRevRecord.retrieve_by_ids( # <--- step into retrieve_by_ids
txn, rev_reg_id, str(cred_rev_id), for_update=True
)
...
class IssuerCredRevRecord(BaseRecord):
...
@classmethod
async def retrieve_by_ids( # <---
cls,
session: ProfileSession,
rev_reg_id: str,
cred_rev_id: str,
*,
for_update: bool = False,
) -> "IssuerCredRevRecord":
"""Retrieve an issuer cred rev record by rev reg id and cred rev id."""
return await cls.retrieve_by_tag_filter( # <--- step into retrieve_by_tag_filter
session,
{"rev_reg_id": rev_reg_id},
{"cred_rev_id": cred_rev_id}, # <--- post filter being applied
for_update=for_update,
)
...
class BaseRecord(BaseModel):
...
@classmethod
async def retrieve_by_tag_filter( # <--
cls: Type[RecordType],
session: ProfileSession,
tag_filter: dict,
post_filter: Optional[dict] = None,
*,
for_update=False,
) -> RecordType:
"""Retrieve a record by tag filter.
Args:
cls: The record class
session: The profile session to use
tag_filter: The filter dictionary to apply
post_filter: Additional value filters to apply matching positively,
with sequence values specifying alternatives to match (hit any)
for_update: Whether to lock the record for update
"""
LOGGER.debug("Retrieving %s record by tag filter %s", cls.RECORD_TYPE, tag_filter)
storage = session.inject(BaseStorage)
rows = await storage.find_all_records( # <--- find all existing IssuerCredRevRecords for this rev_reg_id
cls.RECORD_TYPE,
cls.prefix_tag_filter(tag_filter),
options={"forUpdate": for_update},
)
found = None
for record in rows: # <--- for every row that is fetched... json.loads gets called to deserialize
vals = json.loads(record.value)
if match_post_filter(vals, post_filter, alt=False): # <-- even when we match, we continue through all rows to ensure no duplicates
if found:
LOGGER.info(
"Multiple %s records located for %s%s",
cls.__name__,
tag_filter,
f", {post_filter}" if post_filter else "",
)
raise StorageDuplicateError(
"Multiple {} records located for {}{}".format(
cls.__name__,
tag_filter,
f", {post_filter}" if post_filter else "",
)
)
found = cls.from_storage(record.id, vals)
From the above you should be able to see that if there are N existing credentials, and M are going to be revoked, the operation scales by order N x M.
Therefore, working with large revocation registry sizes becomes untenable.
A simple solution would be to pass all the cred-rev-id's we want to the retrieve function, so we fetch all records once, and immediately match all the ones we need.
I think a best case solution would be if pre-filters could be used, to avoid the json.loads on each row in post-filter checking.