Retrying conversion marker load after error error=Object at location /var/lib/influxdb3/node0/table-index-conversion-completed not found
Could someone point me in the right direction please on the following error I am seeing when launching InfluxDB from the latest influxdb:3-core (v3.4.2) image:
2025-09-15T15:40:34.168639Z INFO influxdb3_lib::commands::serve: InfluxDB 3 Core server starting node_id=node0 git_hash=571299afed3644c69811df9a71816446af64dec0 version=3.4.2 uuid=ce371bba-bd1d-49d0-98da-cee71b1a7c29 num_cpus=2
2025-09-15T15:40:34.171535Z INFO influxdb3_clap_blocks::object_store: Object Store db_dir="/var/lib/influxdb3" object_store_type="Directory"
2025-09-15T15:40:34.171770Z INFO influxdb3_lib::commands::serve: Creating shared query executor num_threads=2
2025-09-15T15:40:34.181430Z INFO influxdb3_catalog::object_store::versions::v2: catalog not found, creating a new one catalog_uuid=fc5be581-f5b7-4b8e-b718-fc8765bf18a7
2025-09-15T15:40:34.184156Z INFO influxdb3_catalog::object_store::versions::v2: persisted catalog checkpoint file sequence=0
2025-09-15T15:40:34.184174Z INFO influxdb3_catalog::catalog::versions::v2::update: create database name="_internal"
2025-09-15T15:40:34.184181Z INFO influxdb3_catalog::catalog::versions::v2: creating new database database_name="_internal"
2025-09-15T15:40:34.184396Z INFO influxdb3_catalog::object_store::versions::v2: persisted next catalog sequence put_result=PutResult { e_tag: Some("3ce2e5-63ed8d3010e91-ed"), version: None } object_path=CatalogFilePath(Path { raw: "node0/catalog/v2/logs/00000000000000000001.catalog" })
2025-09-15T15:40:34.184475Z INFO influxdb3_catalog::catalog::versions::v2: created internal database
2025-09-15T15:40:34.184615Z INFO influxdb3_lib::commands::serve: catalog initialized catalog_uuid=fc5be581-f5b7-4b8e-b718-fc8765bf18a7
2025-09-15T15:40:34.184691Z INFO influxdb3_lib::commands::serve: Initializing table index cache node_id="node0" max_entries=Some(100) concurrency_limit=20
2025-09-15T15:40:34.184705Z INFO influxdb3_write::table_index_cache: creating table indices from split snapshots
2025-09-15T15:40:34.184760Z WARN influxdb3_write::table_index_cache: Retrying conversion marker load after error error=Object at location /var/lib/influxdb3/node0/table-index-conversion-completed not found: No such file or directory (os error 2) retry_after_ms=50 path=node0/table-index-conversion-completed
2025-09-15T15:40:34.236289Z WARN influxdb3_write::table_index_cache: Retrying conversion marker load after error error=Object at location /var/lib/influxdb3/node0/table-index-conversion-completed not found: No such file or directory (os error 2) retry_after_ms=100 path=node0/table-index-conversion-completed
Influx DB is hanging on Retrying conversion marker load after error error=Object at location /var/lib/influxdb3/node0/table-index-conversion-completed not found: No such file or directory (os error 2) retry_after_ms=100 path=node0/table-index-conversion-completed ... ?
I'm thinking perhaps I need to add a mechanism that ensures that I wait for the container to be running etc ...
Any advise anyone can give on this would be brilliant ... what was working is no longer working and I am at a loss as to what I have done or not done that could have caused this.
Many thanks!
@michealroberts - can you confirm that the file does not exist?
Additional context
The WARN is emitted here: https://github.com/influxdata/influxdb/blob/a6f8aab12c0b662d811b618d6d0df3780ecfce40/influxdb3_write/src/table_index_cache.rs#L392-L397
From what I can tell, the request to get that file is made with a retry config that tries twice, which looks to be the case based on the logs shared above.
However, that it hangs after the second WARN implies that either:
- the retry mechanism is hanging
- the runtime is hanging on something else
There are quite a few logs emitted from that function and NOT_FOUND is handled gracefully. This makes me think that (1.) is a possibility, but we'll need to reproduce to be sure.
There are quite a few logs emitted from that function and NOT_FOUND is handled gracefully. This makes me think that (1.) is a possibility, but we'll need to reproduce to be sure.
I've attempted to reproduce hanging behavior in the backon crate with a minimal rust project. I saw it seem to happen one time, but with a looping script was not able to repeat it again so for now I'm assuming the first time I seemed to see it was some kind of fluke.
@michealroberts can you reproduce the issue while passing the -vvv flag to your influxdb3 command and share the resulting logs?