datashare icon indicating copy to clipboard operation
datashare copied to clipboard

Index or re-index a single file content

Open bamthomas opened this issue 7 months ago • 0 comments

Is your feature request related to a problem? Please describe.

#1648 implemented the page indices extraction and content page extraction. There are 2 main issues with it:

  1. it can be very long because it is based on tika content extraction of the original source file
  2. it can be de-synchronized from the indexed content because the tika version could have changed between content extraction and page indices extraction.

Describe the solution you'd like

For the first point see #1814 For the second point, we could add a backend endpoint to reindex a file from its id:

PUT /api/task/reindex/<doc_id>

It could be idempotent, doing nothing if the tika version in metadata is already the latest one. That would launch a background task that is idexing the file. Or could be sync.

bamthomas avatar May 28 '25 13:05 bamthomas