dify icon indicating copy to clipboard operation
dify copied to clipboard

Full Doc mode document can't be retrieved after unarchive

Open efreykongcn opened this issue 9 months ago • 2 comments

Self Checks

  • [x] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [x] I have searched for existing issues search for existing issues, including closed ones.
  • [x] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [x] Please do not modify this template :) and fill in all the required fields.

Dify version

1.1.0

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

  1. Create a knowledge base, let's say: KB1
  2. Upload a document file (attached example file

FullDoc_UnArchive_Retrieve_bug_demo.md

), click Next to Chunk Settings page. 3. Select Parent-child - Full Doc - High quality - Hybrid Search, click Save & Process to Documents page 4. Go to Retrieval Testing page, test "Introduce dify", the uploaded md file is retrieved in result section. 5. Back to Documents page, select the document and click Archive, then 'UnArchive' 6. Repeat step 4, no file retrieved.

✔️ Expected Behavior

After UnArchive, mark down file should be retrieved.

❌ Actual Behavior

After UnArchive, mark down file wasn't retrieved.

efreykongcn avatar Mar 22 '25 11:03 efreykongcn

Hello, @efreykongcn! I'm here to help you with bug fixes, questions, and contributions. Let's work on resolving the issue you've encountered.

The issue you're experiencing with the "Full Doc" mode not retrieving documents after unarchiving might be related to how Dify handles document states. When a document is unarchived, it should become eligible for retrieval as long as it is not archived and is enabled [1]. However, there might be a specific issue with the "Full Doc" mode or the unarchiving process in version 1.1.0 that is preventing the document from being retrieved.

Unfortunately, I couldn't find specific information about known issues with the "Full Doc" mode in the current version. It might be helpful to check if there are any logs or error messages generated during the unarchiving process that could provide more insight into why the document is not being retrieved. Additionally, reviewing the retrieval settings and ensuring that the document is correctly processed after unarchiving might help resolve the issue.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

dosubot[bot] avatar Mar 22 '25 11:03 dosubot[bot]

According to my investigation, in source file dify/api/models/dataset.py child_chunks property line 710, due to the restriction rules.parent_mode != ParentMode.FULL_DOC, dify doesn't load child_chunks for full doc mode document, so the child_chunks can't be indexed when performing unarchive. If remove this restriction, then retrieve test works for unarchived full doc file. I'm wondering what this restriction is designed for.

child_chunks definition in dstaset.py file:

@property
    def child_chunks(self):
        process_rule = self.document.dataset_process_rule
        if process_rule.mode == "hierarchical":
            rules = Rule(**process_rule.rules_dict)
            if rules.parent_mode and rules.parent_mode != ParentMode.FULL_DOC:                
                child_chunks = (
                    db.session.query(ChildChunk)
                    .filter(ChildChunk.segment_id == self.id)
                    .order_by(ChildChunk.position.asc())
                    .all()
                )
                return child_chunks or []
            else:
                return []
        else:
            return []

efreykongcn avatar Mar 22 '25 11:03 efreykongcn