onyx icon indicating copy to clipboard operation
onyx copied to clipboard

Feature Request: Delete Indexed Files

Open rehypothecation opened this issue 2 years ago • 1 comments

Issue: There's currently no way to delete specific indexed files without removing the entire database. As it stands, if there's an error or if a file becomes irrelevant, the only solution is to clear the whole database, which isn't ideal.

Proposed Solution: The ability to delete individual indexed files. This would mean adding a delete option in the interface that would remove the file from the system's database, Vector DB (Qdrant), and the search engine (Typesense).

rehypothecation avatar Jul 14 '23 03:07 rehypothecation

Thanks for bringing creating a tracking issue. This has been on our backlog and fairly high in terms of priority for a while. Probably will tackle this one after doing support for opensource models.

In the meantime, thanks for your patience! We've been getting a lot of request for new features, will get to this ASAP

yuhongsun96 avatar Jul 15 '23 05:07 yuhongsun96

Maybe in addition to removing files from the index also look at moved/renamed resources. A user might rename a parent directory or move a big file where the indexed content itself doesn't change but just the metadata and it might be computationally expensive to reindex those resources. I have not yet looked at how the connectors and indexing handles this so ignore this message if that doesn't pose a problem :-)

megamorf avatar Jul 16 '23 08:07 megamorf

We got this issue too. Seems like we got some stuck documents from integrations we tested and even if it find the same document again in another integration it will not replace it. Not sure if that is because semantic_id is the same or something else (because id is certainly different).

jabdoa2 avatar Jul 18 '23 16:07 jabdoa2

Done! https://github.com/danswer-ai/danswer/pull/271

yuhongsun96 avatar Aug 14 '23 00:08 yuhongsun96

Will extend the feature in the near future with a "delete via search" approach, so that users can delete individual files that are surfaced via searching

yuhongsun96 avatar Aug 14 '23 00:08 yuhongsun96