:bug: Revoking credentials with large rev-reg sizes is inefficient and doesn't scale

Open ff137 opened this issue 6 months ago • 0 comments

When revoking many credentials (batching them and publishing once), we can see significant inefficiencies in the AnonCreds revocation manager.

Essentially, there is a "quadratic inefficiency" (scales by order N²) baked into the revocation process.

Problem summary:

We have a revocable cred def
100 credentials have already been issued
we now want to revoke 10 of them
all 100 records are fetched, in order to find the first cred to revoke
then, all 100 records are fetched again, to find the next cred to revoke. And repeat.
result: 1000 storage records get scanned + deserialized to revoke 10 credentials.

Problem code (the debug logs are my own addition, with # <--- comments added to highlight stuff):

# revocation_anoncreds.manager.RevocationManager

    async def set_cred_revoked_state(
        self, rev_reg_id: str, cred_rev_ids: Sequence[int]
    ) -> None:
        """Update credentials state to credential_revoked.

        Args:
            rev_reg_id: revocation registry ID
            cred_rev_ids: list of credential revocation IDs

        Returns:
            None

        """
        self._logger.debug(
            "Setting credential revoked state for rev_reg_id=%s with %d credential IDs",
            rev_reg_id,
            len(cred_rev_ids),
        )

        for cred_rev_id in cred_rev_ids:   # <--- For each credential we want to revoke
            self._logger.debug(
                "Processing credential revocation ID %s for registry %s",
                cred_rev_id,
                rev_reg_id,
            )
            cred_ex_id = None

            try:
                async with self._profile.transaction() as txn:
                    rev_rec = await IssuerCredRevRecord.retrieve_by_ids(   # <--- step into retrieve_by_ids
                        txn, rev_reg_id, str(cred_rev_id), for_update=True
                    )
...


class IssuerCredRevRecord(BaseRecord):
...
    @classmethod
    async def retrieve_by_ids(  # <---
        cls,
        session: ProfileSession,
        rev_reg_id: str,
        cred_rev_id: str,
        *,
        for_update: bool = False,
    ) -> "IssuerCredRevRecord":
        """Retrieve an issuer cred rev record by rev reg id and cred rev id."""
        return await cls.retrieve_by_tag_filter(   # <--- step into retrieve_by_tag_filter
            session,
            {"rev_reg_id": rev_reg_id},
            {"cred_rev_id": cred_rev_id},   # <--- post filter being applied
            for_update=for_update,
        )

...

class BaseRecord(BaseModel):
...
    @classmethod
    async def retrieve_by_tag_filter(  # <--
        cls: Type[RecordType],
        session: ProfileSession,
        tag_filter: dict,
        post_filter: Optional[dict] = None,
        *,
        for_update=False,
    ) -> RecordType:
        """Retrieve a record by tag filter.

        Args:
            cls: The record class
            session: The profile session to use
            tag_filter: The filter dictionary to apply
            post_filter: Additional value filters to apply matching positively,
                with sequence values specifying alternatives to match (hit any)
            for_update: Whether to lock the record for update
        """
        LOGGER.debug("Retrieving %s record by tag filter %s", cls.RECORD_TYPE, tag_filter)

        storage = session.inject(BaseStorage)
        rows = await storage.find_all_records(   # <--- find all existing IssuerCredRevRecords for this rev_reg_id
            cls.RECORD_TYPE,
            cls.prefix_tag_filter(tag_filter),
            options={"forUpdate": for_update},
        )
        found = None
        for record in rows:  # <--- for every row that is fetched... json.loads gets called to deserialize
            vals = json.loads(record.value)
            if match_post_filter(vals, post_filter, alt=False):  # <-- even when we match, we continue through all rows to ensure no duplicates
                if found:
                    LOGGER.info(
                        "Multiple %s records located for %s%s",
                        cls.__name__,
                        tag_filter,
                        f", {post_filter}" if post_filter else "",
                    )
                    raise StorageDuplicateError(
                        "Multiple {} records located for {}{}".format(
                            cls.__name__,
                            tag_filter,
                            f", {post_filter}" if post_filter else "",
                        )
                    )
                found = cls.from_storage(record.id, vals)

From the above you should be able to see that if there are N existing credentials, and M are going to be revoked, the operation scales by order N x M.

Therefore, working with large revocation registry sizes becomes untenable.

A simple solution would be to pass all the cred-rev-id's we want to the retrieve function, so we fetch all records once, and immediately match all the ones we need.

I think a best case solution would be if pre-filters could be used, to avoid the json.loads on each row in post-filter checking.

Jun 24 '25 13:06 ff137