R2R
R2R copied to clipboard
Unintuitive behaviour when filtering by `collection_ids` as a non-superuser
Describe the Bug
When performing a search/rag/agent request with the following filter:
filters = {
'collection_ids': {'$overlap': ["9fbe403b-..."]}
}
Only documents in the specified collection(s) are expected to be returned. However, for non-superuser users, the actual behavior includes documents owned by the user, even if they are not in the specified collection_ids.
This is due to logic in select_search_filters (search.py):
collection_filters = {
"$or": [
{"owner_id": {"$eq": auth_user.id}},
{"collection_ids": {"$overlap": list(allowed_collections)}},
]
}
The use of $or means any document either owned by the user or in the allowed collections will be returned. This bypasses the intent of specifying collection_ids, making the filter behave unintuitively.
Steps to Reproduce
- Create a non-superuser user.
- Create a new collection.
- Add
document1to the Default collection. - Add
document2to the new_collection. - Perform a
search/rag/agentrequest with the filter:
filters = {
'collection_ids': {'$overlap': [new_collection]}
}
Expected Behavior
Only documents from new_collection should be returned.
Actual Behavior
Documents from other collections owned by the user (e.g., document1 in Default) are also included.
Additional Context
In the /agent/ case there is more logic in _parse_user_and_collection_filters that is also affected by this.