private-gpt
private-gpt copied to clipboard
Skip file ingestion if already ingested
Enhanced ingest_service.ingest_file() and ingest_service.bulk_ingest(…) to skip ingestion if the file(s) is already ingested.
It would be great for this to be reviewed and merged.
Great find that ingesting same file happens, I just stumbled on this myself as well!
I agree with @imartinez to not do this in a core component. Users can request all ingested documents before ingesting new ones. However I do think the "Upload File(s)" button in the GUI should be adjusted to avoid this unintuitive behaviour:
- Delete existing Documents with the same filename
- Ingest the file as normal
negatives:
- slower (not a big difference as long as user doesn't mindlessly upload / inject same files)
positives:
- most intuitive (based on what I saw in the GUI, I thought this was actually what happened)
- handier for most real world use cases: File got updated? simply ingest it and it will overwrite any old one.
- handier for non-technical people in the community focusing on the GUI