[Feature Request]: update metadata of documents via pythin API
Is there an existing issue for the same feature request?
- [x] I have checked the existing issues.
Is your feature request related to a problem?
It is quite a good feature to set metadata of a document in the Web UI. But since I usually have over thousands documents in a database, it is hard and boring to insert the metadata mannually.
Describe the feature you'd like
I would like to add or update metadata of certain document by python API, so that we could manange metadata of documents efficiently.
Thanks!
Describe implementation you've considered
Something like below:
from ragflow_sdk import RAGFlow
doc_meta = {"author": "John", "pub_date": "2025-01-03", "type": "fiction"}
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380") dataset = rag_object.list_datasets(id='id') dataset = dataset[0] doc = dataset.list_documents(id="wdfxb5t547d") doc = doc[0] doc.update([{"chunk_method": "manual"}, "metadata": doc_meta])
Documentation, adoption, use case
Additional information
No response
Edited : Review the comments below, as they more accurately describe the problem and the proposed solution. i've run a python script with update function to change "meta_fields" and i've asked the ai chat to provide the metadata for chunks retrieved and it only worked with manually setted metadata( the changes ). By the way , list_documents doesn't return "meta_fields" The update function does not work as intended for metadata or meta_fields. @KevinHuSh
It's already full filled. FYI
It's already full filled. FYI
When the name is not modified, the meta_fields cannot be changed.
if "name" in req and req["name"] != doc.name:
if (
pathlib.Path(req["name"].lower()).suffix
!= pathlib.Path(doc.name.lower()).suffix
):
return get_result(
message="The extension of file can't be changed",
code=settings.RetCode.ARGUMENT_ERROR,
)
for d in DocumentService.query(name=req["name"], kb_id=doc.kb_id):
if d.name == req["name"]:
return get_error_data_result(
message="Duplicated document name in the same dataset."
)
if not DocumentService.update_by_id(document_id, {"name": req["name"]}):
return get_error_data_result(message="Database error (Document rename)!")
if "meta_fields" in req:
if not isinstance(req["meta_fields"], dict):
return get_error_data_result(message="meta_fields must be a dictionary")
DocumentService.update_meta_fields(document_id, req["meta_fields"])
informs = File2DocumentService.get_by_document_id(document_id)
if informs:
e, file = FileService.get_by_id(informs[0].file_id)
FileService.update_by_id(file.id, {"name": req["name"]})
I've come across the same issue with the endpoint: PUT /api/v1/datasets/{dataset_id}/documents/{document_id}. Without changing the name of the documents, the meta_fields won't be updated even after getting a 200 response.
Thank you so much! I will give a try.