R2R icon indicating copy to clipboard operation
R2R copied to clipboard

Unintuitive behaviour when filtering by `collection_ids` as a non-superuser

Open afiestas opened this issue 6 months ago • 1 comments

Describe the Bug

When performing a search/rag/agent request with the following filter:

filters = {
    'collection_ids': {'$overlap': ["9fbe403b-..."]}
}

Only documents in the specified collection(s) are expected to be returned. However, for non-superuser users, the actual behavior includes documents owned by the user, even if they are not in the specified collection_ids.

This is due to logic in select_search_filters (search.py):

collection_filters = {
    "$or": [
        {"owner_id": {"$eq": auth_user.id}},
        {"collection_ids": {"$overlap": list(allowed_collections)}},
    ]
}

The use of $or means any document either owned by the user or in the allowed collections will be returned. This bypasses the intent of specifying collection_ids, making the filter behave unintuitively.

Steps to Reproduce

  1. Create a non-superuser user.
  2. Create a new collection.
  3. Add document1 to the Default collection.
  4. Add document2 to the new_collection.
  5. Perform a search/rag/agent request with the filter:
filters = {
    'collection_ids': {'$overlap': [new_collection]}
}

Expected Behavior

Only documents from new_collection should be returned.

Actual Behavior

Documents from other collections owned by the user (e.g., document1 in Default) are also included.

Additional Context

In the /agent/ case there is more logic in _parse_user_and_collection_filters that is also affected by this.

afiestas avatar Apr 04 '25 09:04 afiestas