dify icon indicating copy to clipboard operation
dify copied to clipboard

API Bug.The 'indexing_status' API cannot query the batch of the' update_by_file 'API

Open glacierck opened this issue 1 year ago • 9 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.9.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

1.Use the 'update_by_file' API to follow the new document and return the batch normally. 2.Use the 'indexing_status' API to query' batch 'progress information.

✔️ Expected Behavior

progress info.

❌ Actual Behavior

{
    "code": "not_found",
    "message": "Documents not found.",
    "status": 404
}

Ps. The 'batch' returned by the create_by_file api query does not have this issue.

glacierck avatar Oct 11 '24 06:10 glacierck

The "Documents not found" error from the indexing_status API occurs when the DocumentService.get_batch_documents method returns no documents for the given dataset ID and batch. This could be due to:

  1. The batch ID provided does not match any existing batch in the dataset.
  2. The documents associated with the batch have been deleted or were never successfully saved.
  3. The dataset might not exist, or the user does not have the necessary permissions to access it.

Ensure that the batch ID is correct and that the dataset and documents exist. Also, verify that you have the necessary permissions to access the dataset [1][2].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 06:10 dosubot[bot]

@dosu The update upload has been successfully checked in the web, but the progress information of the update cannot be queried.

glacierck avatar Oct 11 '24 06:10 glacierck

The issue with querying the progress information, despite a successful update upload, might be due to:

  1. Progress Update Logic: The onProgress function might not be triggered correctly, affecting the progress display.

  2. Error Handling: If an error occurs during the upload, the progress is set to -2, which could prevent proper display.

  3. Finalization: The progress should be set to 100 upon completion. If this step is missed, it might not reflect correctly [1].

Ensure these areas are functioning as expected to resolve the issue.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 06:10 dosubot[bot]

@dosu The update has been confirmed to be successful, but the issue is that the 'batch' returned by the API cannot retrieve progress information. The 'batch' returned by the new operation does not have this issue

glacierck avatar Oct 11 '24 06:10 glacierck

The issue with the 'batch' returned by the API not retrieving progress information is likely because the batch identifier is missing or incorrect. The 'batch' from the new operation includes a batch identifier, which is necessary to track progress using the indexing-status endpoint. Ensure that the batch ID is correctly generated and passed to the indexing_status API to retrieve progress information [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 06:10 dosubot[bot]

@dosu After testing, I found that it is possible to query the progress of the update through the 'batch' when adding, and once again confirm that the 'batch' returned during the update is invalid. The progress information of the document can only be retrieved through the initial 'batch'. This should be a serious bug!

glacierck avatar Oct 11 '24 07:10 glacierck

It seems like the issue is with the batch ID returned by the update_by_file API. If the batch ID is invalid, it won't match any existing records, leading to the "Documents not found" error when querying progress.

To address this, you should:

  1. Verify Batch ID Generation: Ensure that the batch ID generated during the update is correct and matches the format used in the create_by_file API.

  2. Check Database Records: Confirm that the batch ID is correctly stored and associated with the documents in the database.

  3. Debug Logging: Implement logging to track the batch ID generation and assignment process to identify any discrepancies.

These steps should help in diagnosing and fixing the issue with the batch ID during updates.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 07:10 dosubot[bot]

@dosu The progress information of 'creat_by_file' is incorrectly overwritten by the progress information of 'update_by_file', and progress can only be queried through the 'batch' returned by 'creat_by_file'

glacierck avatar Oct 11 '24 07:10 glacierck

The issue arises because both update_by_file and create_by_file operations return a batch identifier, but only the batch from create_by_file is used to query progress. This is due to the system design, which tracks progress using the batch from create_by_file as specified in the API documentation [1][2].

To resolve this, ensure that the batch ID from update_by_file is correctly generated and stored, and verify that it can be used to query progress. You might need to check the implementation to ensure that both operations handle batch IDs consistently.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 11 '24 07:10 dosubot[bot]

+1

imaben avatar Nov 01 '24 08:11 imaben