API Bug.The 'indexing_status' API cannot query the batch of the' update_by_file 'API
Self Checks
- [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- [X] Please do not modify this template :) and fill in all the required fields.
Dify version
0.9.1
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
1.Use the 'update_by_file' API to follow the new document and return the batch normally. 2.Use the 'indexing_status' API to query' batch 'progress information.
✔️ Expected Behavior
progress info.
❌ Actual Behavior
{
"code": "not_found",
"message": "Documents not found.",
"status": 404
}
Ps. The 'batch' returned by the create_by_file api query does not have this issue.
The "Documents not found" error from the indexing_status API occurs when the DocumentService.get_batch_documents method returns no documents for the given dataset ID and batch. This could be due to:
- The batch ID provided does not match any existing batch in the dataset.
- The documents associated with the batch have been deleted or were never successfully saved.
- The dataset might not exist, or the user does not have the necessary permissions to access it.
Ensure that the batch ID is correct and that the dataset and documents exist. Also, verify that you have the necessary permissions to access the dataset [1][2].
To continue talking to Dosu, mention @dosu.
@dosu The update upload has been successfully checked in the web, but the progress information of the update cannot be queried.
The issue with querying the progress information, despite a successful update upload, might be due to:
-
Progress Update Logic: The
onProgressfunction might not be triggered correctly, affecting the progress display. -
Error Handling: If an error occurs during the upload, the progress is set to
-2, which could prevent proper display. -
Finalization: The progress should be set to
100upon completion. If this step is missed, it might not reflect correctly [1].
Ensure these areas are functioning as expected to resolve the issue.
To continue talking to Dosu, mention @dosu.
@dosu The update has been confirmed to be successful, but the issue is that the 'batch' returned by the API cannot retrieve progress information. The 'batch' returned by the new operation does not have this issue
The issue with the 'batch' returned by the API not retrieving progress information is likely because the batch identifier is missing or incorrect. The 'batch' from the new operation includes a batch identifier, which is necessary to track progress using the indexing-status endpoint. Ensure that the batch ID is correctly generated and passed to the indexing_status API to retrieve progress information [1].
To continue talking to Dosu, mention @dosu.
@dosu After testing, I found that it is possible to query the progress of the update through the 'batch' when adding, and once again confirm that the 'batch' returned during the update is invalid. The progress information of the document can only be retrieved through the initial 'batch'. This should be a serious bug!
It seems like the issue is with the batch ID returned by the update_by_file API. If the batch ID is invalid, it won't match any existing records, leading to the "Documents not found" error when querying progress.
To address this, you should:
-
Verify Batch ID Generation: Ensure that the batch ID generated during the update is correct and matches the format used in the
create_by_fileAPI. -
Check Database Records: Confirm that the batch ID is correctly stored and associated with the documents in the database.
-
Debug Logging: Implement logging to track the batch ID generation and assignment process to identify any discrepancies.
These steps should help in diagnosing and fixing the issue with the batch ID during updates.
To continue talking to Dosu, mention @dosu.
@dosu The progress information of 'creat_by_file' is incorrectly overwritten by the progress information of 'update_by_file', and progress can only be queried through the 'batch' returned by 'creat_by_file'
The issue arises because both update_by_file and create_by_file operations return a batch identifier, but only the batch from create_by_file is used to query progress. This is due to the system design, which tracks progress using the batch from create_by_file as specified in the API documentation [1][2].
To resolve this, ensure that the batch ID from update_by_file is correctly generated and stored, and verify that it can be used to query progress. You might need to check the implementation to ensure that both operations handle batch IDs consistently.
To continue talking to Dosu, mention @dosu.
+1