aries-cloudagent-python icon indicating copy to clipboard operation
aries-cloudagent-python copied to clipboard

:bug: Revoking credentials with large rev-reg sizes is inefficient and doesn't scale

Open ff137 opened this issue 6 months ago • 0 comments

When revoking many credentials (batching them and publishing once), we can see significant inefficiencies in the AnonCreds revocation manager.

Essentially, there is a "quadratic inefficiency" (scales by order N²) baked into the revocation process.

Problem summary:

  • We have a revocable cred def
  • 100 credentials have already been issued
  • we now want to revoke 10 of them
  • all 100 records are fetched, in order to find the first cred to revoke
  • then, all 100 records are fetched again, to find the next cred to revoke. And repeat.
  • result: 1000 storage records get scanned + deserialized to revoke 10 credentials.

Problem code (the debug logs are my own addition, with # <--- comments added to highlight stuff):

# revocation_anoncreds.manager.RevocationManager

    async def set_cred_revoked_state(
        self, rev_reg_id: str, cred_rev_ids: Sequence[int]
    ) -> None:
        """Update credentials state to credential_revoked.

        Args:
            rev_reg_id: revocation registry ID
            cred_rev_ids: list of credential revocation IDs

        Returns:
            None

        """
        self._logger.debug(
            "Setting credential revoked state for rev_reg_id=%s with %d credential IDs",
            rev_reg_id,
            len(cred_rev_ids),
        )

        for cred_rev_id in cred_rev_ids:   # <--- For each credential we want to revoke
            self._logger.debug(
                "Processing credential revocation ID %s for registry %s",
                cred_rev_id,
                rev_reg_id,
            )
            cred_ex_id = None

            try:
                async with self._profile.transaction() as txn:
                    rev_rec = await IssuerCredRevRecord.retrieve_by_ids(   # <--- step into retrieve_by_ids
                        txn, rev_reg_id, str(cred_rev_id), for_update=True
                    )
...


class IssuerCredRevRecord(BaseRecord):
...
    @classmethod
    async def retrieve_by_ids(  # <---
        cls,
        session: ProfileSession,
        rev_reg_id: str,
        cred_rev_id: str,
        *,
        for_update: bool = False,
    ) -> "IssuerCredRevRecord":
        """Retrieve an issuer cred rev record by rev reg id and cred rev id."""
        return await cls.retrieve_by_tag_filter(   # <--- step into retrieve_by_tag_filter
            session,
            {"rev_reg_id": rev_reg_id},
            {"cred_rev_id": cred_rev_id},   # <--- post filter being applied
            for_update=for_update,
        )

...

class BaseRecord(BaseModel):
...
    @classmethod
    async def retrieve_by_tag_filter(  # <--
        cls: Type[RecordType],
        session: ProfileSession,
        tag_filter: dict,
        post_filter: Optional[dict] = None,
        *,
        for_update=False,
    ) -> RecordType:
        """Retrieve a record by tag filter.

        Args:
            cls: The record class
            session: The profile session to use
            tag_filter: The filter dictionary to apply
            post_filter: Additional value filters to apply matching positively,
                with sequence values specifying alternatives to match (hit any)
            for_update: Whether to lock the record for update
        """
        LOGGER.debug("Retrieving %s record by tag filter %s", cls.RECORD_TYPE, tag_filter)

        storage = session.inject(BaseStorage)
        rows = await storage.find_all_records(   # <--- find all existing IssuerCredRevRecords for this rev_reg_id
            cls.RECORD_TYPE,
            cls.prefix_tag_filter(tag_filter),
            options={"forUpdate": for_update},
        )
        found = None
        for record in rows:  # <--- for every row that is fetched... json.loads gets called to deserialize
            vals = json.loads(record.value)
            if match_post_filter(vals, post_filter, alt=False):  # <-- even when we match, we continue through all rows to ensure no duplicates
                if found:
                    LOGGER.info(
                        "Multiple %s records located for %s%s",
                        cls.__name__,
                        tag_filter,
                        f", {post_filter}" if post_filter else "",
                    )
                    raise StorageDuplicateError(
                        "Multiple {} records located for {}{}".format(
                            cls.__name__,
                            tag_filter,
                            f", {post_filter}" if post_filter else "",
                        )
                    )
                found = cls.from_storage(record.id, vals)

From the above you should be able to see that if there are N existing credentials, and M are going to be revoked, the operation scales by order N x M.

Therefore, working with large revocation registry sizes becomes untenable.


A simple solution would be to pass all the cred-rev-id's we want to the retrieve function, so we fetch all records once, and immediately match all the ones we need.

I think a best case solution would be if pre-filters could be used, to avoid the json.loads on each row in post-filter checking.

ff137 avatar Jun 24 '25 13:06 ff137