quickwit
quickwit copied to clipboard
`quickwit tool local-ingest` broken
As reported by "opendata"...
We need to test quickwit tool local-ingest or deprecate it.
❯ Ingesting documents locally...
---------------------------------------------------
Connectivity checklist
✔ metastore storage
✔ metastore
✔ index storage
✔ _ingest-cli-source
2024-10-16T22:51:55.538Z ERROR quickwit_actors::spawn_builder: actor-failure cause=early eof exit_status=Failure(early eof)
2024-10-16T22:51:55.538Z ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=SourceActor-autumn-tuMQ exit_status=Failure(early eof)
2024-10-16T22:51:56.537Z ERROR quickwit_actors::actor_handle: actor-exit-without-success actor="SourceActor-autumn-tuMQ"
Seeing a lot of problems with local-ingest since 8.x
Every 2nd or 3rd import also getting this:
Indexed 13,572,489 documents in 1m 4s.
*** ERROR tantivy::directory::directory: Failed to remove the lock file. FileDoesNotExist(".tantivy-writer.lock")
*** ERROR quickwit_indexing::actors::merge_scheduler_service: merge scheduler service is dead
*** ERROR quickwit_actors::spawn_builder: actor-failure cause=An IO error occurred: 'No such file or directory (os error 2)' exit_status=Failure(An IO error occurred: 'No such file or directory (os error 2)')
*** ERROR quickwit_actors::actor_context: exit activating-kill-switch actor=MergeExecutor-dawn-dsdr exit_status=Failure(An IO error occurred: 'No such file or directory (os error 2)')
I vote for deprecating in 0.9 and removing two releases later.
The command is nice to profile indexing performance
it seems like this happens when a merge is running while the ingestion finishes. This sounds a lot like an issue we had with lambda deployments. Which brings the question, what should we do:
- if we want to deprecate, maybe do nothing? this isn't a problematic error in that all data is here, just not as merged as it could be
- we could wait for any merges to finish, that seems like a good idea, but also means occasionally local ingest is going to be slow for a small file if it happens to be a time we want to merge. If we keep that command, i think this is the way to go
- do not merge, that sounds like a fairly bad idea given the local ingest can generate a few splits for a single file getting ingested, so it could have a fairly bad performance impact at search. I don't think that's a good idea.