quickwit icon indicating copy to clipboard operation
quickwit copied to clipboard

Allow disabling retries in indexing loop or stopping indexing via Lambda handler

Open alexkreidler opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe. I mistakenly passed an invalid value to the quickwit-lambda INDEX_CONFIG_URI environment variable because I thought it could accept http URIs not just filesystem URIs. The lambda function proceeded to retry in an exponential backoff loop for 15 minutes (even after I deleted it after a few minutes in).

Logs

INIT_START Runtime Version: provided:al2.v37	Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:bc2882fd0e085da713a4e150009e80c93e37aef25d53897e472ddda5ffbd589d
START RequestId: 27d083f2-95f0-4bca-ae05-db5639e8c6d9 Version: $LATEST
2024-06-28T04:59:25.108Z  INFO Lambda runtime invoke:indexer_handler: quickwit_telemetry::sender: telemetry to https://telemetry.quickwit.io/ is enabled requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.124Z  INFO Lambda runtime invoke:indexer_handler: quickwit_config::node_config::serialize: using listen address `127.0.0.1` as advertise address advertise_address=127.0.0.1 requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.124Z  WARN Lambda runtime invoke:indexer_handler: quickwit_config::node_config::serialize: peer seeds are empty requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.124Z  INFO Lambda runtime invoke:indexer_handler: quickwit_lambda::utils: loaded node config config=NodeConfig { cluster_id: ""lambda-ephemeral"", node_id: ""lambda-indexer"", enabled_services: {Metastore, Janitor, Searcher, ControlPlane, Indexer}, gossip_listen_addr: 127.0.0.1:7280, grpc_listen_addr: 127.0.0.1:7281, gossip_advertise_addr: 127.0.0.1:7280, grpc_advertise_addr: 127.0.0.1:7281, gossip_interval: 1s, peer_seeds: [], data_dir_path: ""/tmp"", metastore_uri: Uri { uri: ""s3://my-quickwit-index/index"" }, default_index_root_uri: Uri { uri: ""s3://my-quickwit-index/index"" }, rest_config: RestConfig { listen_addr: 127.0.0.1:7280, cors_allow_origins: [], extra_headers: {} }, grpc_config: GrpcConfig { max_message_size: 21.0 MB }, storage_configs: StorageConfigs([]), metastore_configs: MetastoreConfigs([]), indexer_config: IndexerConfig { split_store_max_num_bytes: 107.4 GB, split_store_max_num_splits: 1000, max_concurrent_split_uploads: 12, max_merge_write_throughput: None, merge_concurrency: 1, enable_otlp_endpoint: true, enable_cooperative_indexing: false, cpu_capacity: CpuCapacity(2000) }, searcher_config: SearcherConfig { aggregation_memory_limit: 500.0 MB, aggregation_bucket_limit: 65000, fast_field_cache_capacity: 1000.0 MB, split_footer_cache_capacity: 500.0 MB, partial_request_cache_capacity: 64.0 MB, max_num_concurrent_split_searches: 100, max_num_concurrent_split_streams: 100, split_cache: None }, ingest_api_config: IngestApiConfig { max_queue_memory_usage: 2.1 GB, max_queue_disk_usage: 4.3 GB, replication_factor: 1, content_length_limit: 10.5 MB }, jaeger_config: JaegerConfig { enable_endpoint: true, lookback_period_hours: 72, max_trace_duration_secs: 3600, max_fetch_spans: 10000 } } requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.179Z  INFO Lambda runtime invoke:indexer_handler:lazy_load_credentials: aws_credential_types::cache::lazy_caching: credentials cache miss occurred; added new AWS credentials (took 20.557µs) requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.249Z  INFO Lambda runtime invoke:indexer_handler: quickwit_lambda::indexer::ingest::helpers: Index not found, creating it index_id=""test-index"" index_config_uri=""s3://my-quickwit-index-config/index-config.yaml"" requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.253Z  INFO Lambda runtime invoke:indexer_handler:lazy_load_credentials: aws_credential_types::cache::lazy_caching: credentials cache miss occurred; added new AWS credentials (took 11.862µs) requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.283Z  INFO Lambda runtime invoke:indexer_handler: quickwit_config::index_config::serialize: index config does not specify `index_uri`, falling back to default value index_id=test-index index_uri=s3://my-quickwit-index/index/test-index requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.361Z  INFO Lambda runtime invoke:indexer_handler: quickwit_lambda::indexer::ingest::helpers: index created requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.361Z  INFO Lambda runtime invoke:indexer_handler: quickwit_cluster::cluster: joining cluster cluster_id=lambda-ephemeral node_id=lambda-indexer generation_id=1719550765361866605 enabled_services={Indexer, Janitor} gossip_listen_addr=127.0.0.1:7280 gossip_advertise_addr=127.0.0.1:7280 grpc_advertise_addr=127.0.0.1:7281 peer_seed_addrs= requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.361Z  INFO Lambda runtime invoke:indexer_handler: chitchat::server: initial_seed_addrs={} requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.363Z  INFO Lambda runtime invoke:indexer_handler: quickwit_janitor: starting janitor service requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.363Z  WARN Lambda runtime invoke:indexer_handler: quickwit_janitor: delete task service is disabled: delete queries will not be processed requestId=""27d083f2-95f0-4bca-ae05-db5639e8c6d9"" xrayTraceId=""Root=1-667e432c-144d1373649ff6be11253a6c;Parent=04db4f312ff1b20e;Sampled=0;Lineage=8e5c6e72:0"" request_id=""27d083f2-95f0-4bca-ae05-db5639e8c6d9""2024-06-28T04:59:25.370Z  INFO quickwit_cluster::change: node `lambda-indexer` has joined the cluster node_id=lambda-indexer generation_id=1719550765361866605
2024-06-28T04:59:25.389Z  INFO quickwit_janitor::actors::garbage_collector: loaded 1 indexes from the metastore
2024-06-28T04:59:25.396Z  INFO spawn_pipeline: quickwit_indexing::actors::indexing_pipeline: spawning indexing pipeline index_id=""test-index"" source_id=""_ingest-lambda-source"" pipeline_uid=00000000000000000000000000 root_dir=/tmp/indexing/test-index%01J1EKCT7AZP9QBSQCMJ78Q50R%_ingest-lambda-source%00000000000000000000000000%QqVof6 index=test-index gen=0
2024-06-28T04:59:25.397Z ERROR quickwit_indexing::actors::indexing_pipeline: error while spawning indexing pipeline, retrying after some time error=failed to create source `_ingest-lambda-source` of type `file`. Cause: unknown URI protocol `https`
Caused by:
unknown URI protocol `https` retry_count=0 retry_delay=2s
2024-06-28T04:59:25.397Z  INFO quickwit_actors::spawn_builder: no more messages actor=""quickwit_indexing::actors::doc_processor::DocProcessor-fragrant-M0AB""2024-06-28T04:59:25.397Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=quickwit_indexing::actors::doc_processor::DocProcessor-fragrant-M0AB exit_status=success
2024-06-28T04:59:25.397Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=Indexer-morning-FaT1 exit_status=success
2024-06-28T04:59:25.397Z  INFO quickwit_actors::spawn_builder: no more messages actor=""quickwit_indexing::actors::index_serializer::IndexSerializer-fragrant-wrd5""2024-06-28T04:59:25.397Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=quickwit_indexing::actors::index_serializer::IndexSerializer-fragrant-wrd5 exit_status=success
2024-06-28T04:59:25.397Z  INFO quickwit_actors::spawn_builder: no more messages actor=""Packager-summer-vkFg""2024-06-28T04:59:25.397Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=Packager-summer-vkFg exit_status=success
2024-06-28T04:59:25.397Z  INFO spawn_merge_pipeline: quickwit_indexing::actors::merge_pipeline: spawning merge pipeline index_id=test-index source_id=_ingest-lambda-source pipeline_uid=00000000000000000000000000 root_dir=/tmp/indexing/test-index%01J1EKCT7AZP9QBSQCMJ78Q50R%_ingest-lambda-source%00000000000000000000000000%QqVof6 merge_policy=StableLogMergePolicy { config: StableLogMergePolicyConfig { min_level_num_docs: 100000, merge_factor: 10, max_merge_factor: 12, maturation_period: 172800s }, split_num_docs_target: 10000000 } index=""test-index"" gen=0
2024-06-28T04:59:25.397Z  INFO spawn_merge_pipeline: quickwit_indexing::actors::merge_pipeline: loaded list of published splits num_splits=0 index=""test-index"" gen=0
2024-06-28T04:59:25.398Z  INFO quickwit_janitor::actors::retention_policy_executor: loaded 1 indexes from the metastore
2024-06-28T04:59:25.400Z  INFO quickwit_actors::spawn_builder: no more messages actor=""IndexUploader-floral-JC4b""2024-06-28T04:59:25.400Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=IndexUploader-floral-JC4b exit_status=success
2024-06-28T04:59:25.400Z  INFO quickwit_actors::spawn_builder: no more messages actor=""quickwit_indexing::actors::sequencer::Sequencer<quickwit_indexing::actors::publisher::Publisher>-white-kOmg""2024-06-28T04:59:25.400Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=quickwit_indexing::actors::sequencer::Sequencer<quickwit_indexing::actors::publisher::Publisher>-white-kOmg exit_status=success
2024-06-28T04:59:25.400Z  INFO quickwit_actors::spawn_builder: no more messages actor=""Publisher-icy-5uDS""2024-06-28T04:59:25.400Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=Publisher-icy-5uDS exit_status=success
2024-06-28T04:59:27.399Z  INFO spawn_pipeline: quickwit_indexing::actors::indexing_pipeline: spawning indexing pipeline index_id=""test-index"" source_id=""_ingest-lambda-source"" pipeline_uid=00000000000000000000000000 root_dir=/tmp/indexing/test-index%01J1EKCT7AZP9QBSQCMJ78Q50R%_ingest-lambda-source%00000000000000000000000000%QqVof6 index=test-index gen=0
2024-06-28T04:59:27.399Z ERROR quickwit_indexing::actors::indexing_pipeline: error while spawning indexing pipeline, retrying after some time error=failed to create source `_ingest-lambda-source` of type `file`. Cause: unknown URI protocol `https`
Caused by:
unknown URI protocol `https` retry_count=1 retry_delay=4s
2024-06-28T04:59:27.399Z  INFO quickwit_actors::spawn_builder: no more messages actor=""quickwit_indexing::actors::doc_processor::DocProcessor-hidden-TDQv""2024-06-28T04:59:27.399Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=quickwit_indexing::actors::doc_processor::DocProcessor-hidden-TDQv exit_status=success
<truncated>
2024-06-28T05:07:55.412Z  INFO spawn_pipeline: quickwit_indexing::actors::indexing_pipeline: spawning indexing pipeline index_id=""test-index"" source_id=""_ingest-lambda-source"" pipeline_uid=00000000000000000000000000 root_dir=/tmp/indexing/test-index%01J1EKCT7AZP9QBSQCMJ78Q50R%_ingest-lambda-source%00000000000000000000000000%QqVof6 index=test-index gen=0
2024-06-28T05:07:55.413Z ERROR quickwit_indexing::actors::indexing_pipeline: error while spawning indexing pipeline, retrying after some time error=failed to create source `_ingest-lambda-source` of type `file`. Cause: unknown URI protocol `https`
Caused by:
unknown URI protocol `https` retry_count=8 retry_delay=512s
2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: no more messages actor=""quickwit_indexing::actors::doc_processor::DocProcessor-young-hC2f""2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=quickwit_indexing::actors::doc_processor::DocProcessor-young-hC2f exit_status=success
2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=Indexer-damp-crmm exit_status=success
2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: no more messages actor=""quickwit_indexing::actors::index_serializer::IndexSerializer-cold-8pzg""2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=quickwit_indexing::actors::index_serializer::IndexSerializer-cold-8pzg exit_status=success
2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: no more messages actor=""Packager-sparkling-RjAR""2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=Packager-sparkling-RjAR exit_status=success
2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: no more messages actor=""IndexUploader-lingering-ASZJ""2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=IndexUploader-lingering-ASZJ exit_status=success
2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: no more messages actor=""quickwit_indexing::actors::sequencer::Sequencer<quickwit_indexing::actors::publisher::Publisher>-restless-FbUq""2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=quickwit_indexing::actors::sequencer::Sequencer<quickwit_indexing::actors::publisher::Publisher>-restless-FbUq exit_status=success
2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: no more messages actor=""Publisher-hidden-GbyG""2024-06-28T05:07:55.413Z  INFO quickwit_actors::spawn_builder: actor-exit actor_id=Publisher-hidden-GbyG exit_status=success
2024-06-28T05:09:25.406Z  INFO quickwit_janitor::actors::garbage_collector: loaded 1 indexes from the metastore
2024-06-28T05:14:25.147Z 27d083f2-95f0-4bca-ae05-db5639e8c6d9 Task timed out after 900.06 seconds

END RequestId: 27d083f2-95f0-4bca-ae05-db5639e8c6d9
REPORT RequestId: 27d083f2-95f0-4bca-ae05-db5639e8c6d9	Duration: 900055.97 ms	Billed Duration: 900089 ms	Memory Size: 3008 MB	Max Memory Used: 51 MB	Init Duration: 88.76 ms	
INIT_START Runtime Version: provided:al2.v37	Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:bc2882fd0e085da713a4e150009e80c93e37aef25d53897e472ddda5ffbd589d

Describe the solution you'd like I'd like to be able to disable the retry functionality in the indexing pipeline with a configuration option and or environment variable.

If a user passes in invalid configuration that creates an error in the spawning of the indexer (like the error in load_source in my case), it will retry potentially forever, which isn't a problem for CLI users who can kill the process but is problematic for Lambda and other environments.

A TODO message indicates this might have already been considered.

Describe alternatives you've considered Another solution specifically for the Lambda environment would be to modify the request handler to have a separate kill endpoint/event type that would call indexing_service_handle.kill() so users could manually stop indexing if they want. This might be useful in cases besides an error loop such as an unexpectedly large input dataset.

alexkreidler avatar Jun 28 '24 05:06 alexkreidler