[Bug]: meta_fieldsHow to view all the field names contained in meta fields through python code。
Self Checks
- [x] I have searched for existing issues search for existing issues, including closed ones.
- [x] I confirm that I am using English to submit this report (Language Policy).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- [x] Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
567
RAGFlow image version
v0.22.1
Other environment information
meta_fieldsHow to view all the field names contained in meta fields through python code。
Actual behavior
meta_fieldsHow to view all the field names contained in meta fields through python code。
Expected behavior
meta_fieldsHow to view all the field names contained in meta fields through python code。
Steps to reproduce
meta_fieldsHow to view all the field names contained in meta fields through python code。
Additional information
meta_fieldsHow to view all the field names contained in meta fields through python code。
You can view all the field names contained in meta fields for a document in RAGFlow by accessing the meta_fields attribute, which is a dictionary. For a single document, use:
field_names = list(doc.meta_fields.keys())
print(field_names)
To get all meta field names across a dataset, use the SDK's get_meta_by_kbs method:
meta_info = rag.get_meta_by_kbs([dataset_id])
print(meta_info.keys()) # Shows all meta field names present in the dataset
Meta fields must be set in each document's meta_fields to be visible and filterable. If you batch upload documents, update their metadata individually to ensure the fields are present. For more details, see the Document model definition and SDK usage examples.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
{'doc_name': '2024发电机组计划检修.txt'}I have already obtained the corresponding metadata using the above method. Could you please provide a complete Python code example to implement metadata filtering—specifically, retrieving all documents where the value of the doc_name key contains "Shanxi Province"? Please write only the full Python code.on @dosu.
from ragflow_sdk import RAGFlow
# Initialize the RAGFlow client
rag = RAGFlow(api_key="YOUR_API_KEY", base_url="http://YOUR_BASE_URL:9380")
# Get your dataset (replace with your actual dataset name or ID)
datasets = rag.list_datasets(name="YOUR_DATASET_NAME")
dataset = datasets[0]
# Build the metadata_condition for substring filtering on the 'doc_name' field
metadata_condition = {
"logic": "and",
"conditions": [
{
"name": "doc_name", # must match the key in meta_fields
"comparison_operator": "contains",
"value": "Shanxi Province"
}
]
}
# Retrieve document chunks matching the filter
results = rag.retrieve(
question="", # or your search query
dataset_ids=[dataset.id],
metadata_condition=metadata_condition
)
for chunk in results:
print(chunk.document_name, chunk.content)
# Note: The 'doc_name' field must be present in each document's meta_fields for filtering to work.
This code works with RAGFlow v0.22.1 and later. If you batch uploaded documents, make sure each document's meta_fields includes the 'doc_name' key for filtering to be effective.
Reference: metadata filtering example
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
Is my understanding of keyword filtering correct? For instance, when I use document names as metadata for filtering, during the retrieval process, the system will only search from documents that meet the metadata conditions, and documents that do not meet the conditions will not participate in subsequent vector or keyword matching at all. The final returned result only contains the document content that matches the metadata.n @dosu.
Yes, your understanding is correct for RAGFlow v0.22.1: when you use metadata filtering (for example, filtering by document name), only documents whose meta_fields match the filter participate in retrieval. Documents that do not meet the metadata conditions are excluded from all subsequent vector or keyword matching, and the final results only contain content from the filtered set. This pre-filtering is implemented at the start of the retrieval process, so it narrows the search space before any semantic or keyword search happens (source, source, source, source).
A few caveats: filtering only works if the metadata field is present in every document's meta_fields, and the field name in your filter must exactly match the key in meta_fields. Batch uploads do not set meta_fields by default, so you may need to update them individually after upload. Also, passing explicit document_ids disables metadata filtering.
If you need to debug, check your meta_fields and filter structure, and make sure you're not filtering on chunk-level properties that aren't included in meta_fields. If you follow these guidelines, metadata filtering will work as you described.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
May I ask when the next version will be released approximately and how long it will take for the above-mentioned metadata filtering to be fixed. @ n @dosu.
you want a http api to achieve this?