ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Feature Request]: update metadata of documents via pythin API

Open yangxg opened this issue 10 months ago • 5 comments

Is there an existing issue for the same feature request?

  • [x] I have checked the existing issues.

Is your feature request related to a problem?

It is quite a good feature to set metadata of a document in the Web UI. But since I usually have over thousands documents in a database, it is hard and boring to insert the metadata mannually.

Describe the feature you'd like

I would like to add or update metadata of certain document by python API, so that we could manange metadata of documents efficiently.
Thanks!

Describe implementation you've considered

Something like below:

from ragflow_sdk import RAGFlow

doc_meta = {"author": "John", "pub_date": "2025-01-03", "type": "fiction"}

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380") dataset = rag_object.list_datasets(id='id') dataset = dataset[0] doc = dataset.list_documents(id="wdfxb5t547d") doc = doc[0] doc.update([{"chunk_method": "manual"}, "metadata": doc_meta])

Documentation, adoption, use case


Additional information

No response

yangxg avatar Feb 08 '25 03:02 yangxg

Edited : Review the comments below, as they more accurately describe the problem and the proposed solution. i've run a python script with update function to change "meta_fields" and i've asked the ai chat to provide the metadata for chunks retrieved and it only worked with manually setted metadata( the changes ). By the way , list_documents doesn't return "meta_fields" The update function does not work as intended for metadata or meta_fields. @KevinHuSh

AhmedYaich28 avatar Mar 06 '25 03:03 AhmedYaich28

It's already full filled. FYI

KevinHuSh avatar Mar 06 '25 10:03 KevinHuSh

It's already full filled. FYI

When the name is not modified, the meta_fields cannot be changed.


if "name" in req and req["name"] != doc.name:
        if (
                pathlib.Path(req["name"].lower()).suffix
                != pathlib.Path(doc.name.lower()).suffix
        ):
            return get_result(
                message="The extension of file can't be changed",
                code=settings.RetCode.ARGUMENT_ERROR,
            )
        for d in DocumentService.query(name=req["name"], kb_id=doc.kb_id):
            if d.name == req["name"]:
                return get_error_data_result(
                    message="Duplicated document name in the same dataset."
                )
        if not DocumentService.update_by_id(document_id, {"name": req["name"]}):
            return get_error_data_result(message="Database error (Document rename)!")
        if "meta_fields" in req:
            if not isinstance(req["meta_fields"], dict):
                return get_error_data_result(message="meta_fields must be a dictionary")
            DocumentService.update_meta_fields(document_id, req["meta_fields"])

        informs = File2DocumentService.get_by_document_id(document_id)
        if informs:
            e, file = FileService.get_by_id(informs[0].file_id)
            FileService.update_by_id(file.id, {"name": req["name"]})

JoieLu avatar Mar 06 '25 12:03 JoieLu

I've come across the same issue with the endpoint: PUT /api/v1/datasets/{dataset_id}/documents/{document_id}. Without changing the name of the documents, the meta_fields won't be updated even after getting a 200 response.

must-jpg avatar Mar 07 '25 00:03 must-jpg

Thank you so much! I will give a try.

yangxg avatar Mar 12 '25 13:03 yangxg