private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

Skip file ingestion if already ingested

Open spirosbond opened this issue 1 year ago • 2 comments

Enhanced ingest_service.ingest_file() and ingest_service.bulk_ingest(…) to skip ingestion if the file(s) is already ingested.

spirosbond avatar Jan 08 '24 15:01 spirosbond

It would be great for this to be reviewed and merged.

darko-mijic avatar Jan 11 '24 12:01 darko-mijic

Great find that ingesting same file happens, I just stumbled on this myself as well!

I agree with @imartinez to not do this in a core component. Users can request all ingested documents before ingesting new ones. However I do think the "Upload File(s)" button in the GUI should be adjusted to avoid this unintuitive behaviour:

  1. Delete existing Documents with the same filename
  2. Ingest the file as normal

negatives:

  • slower (not a big difference as long as user doesn't mindlessly upload / inject same files)

positives:

  • most intuitive (based on what I saw in the GUI, I thought this was actually what happened)
  • handier for most real world use cases: File got updated? simply ingest it and it will overwrite any old one.
  • handier for non-technical people in the community focusing on the GUI

Robinsane avatar Feb 07 '24 11:02 Robinsane