influxdb icon indicating copy to clipboard operation
influxdb copied to clipboard

Retrying conversion marker load after error error=Object at location /var/lib/influxdb3/node0/table-index-conversion-completed not found

Open michealroberts opened this issue 4 months ago • 2 comments

Could someone point me in the right direction please on the following error I am seeing when launching InfluxDB from the latest influxdb:3-core (v3.4.2) image:

2025-09-15T15:40:34.168639Z  INFO influxdb3_lib::commands::serve: InfluxDB 3 Core server starting node_id=node0 git_hash=571299afed3644c69811df9a71816446af64dec0 version=3.4.2 uuid=ce371bba-bd1d-49d0-98da-cee71b1a7c29 num_cpus=2
2025-09-15T15:40:34.171535Z  INFO influxdb3_clap_blocks::object_store: Object Store db_dir="/var/lib/influxdb3" object_store_type="Directory"
2025-09-15T15:40:34.171770Z  INFO influxdb3_lib::commands::serve: Creating shared query executor num_threads=2
2025-09-15T15:40:34.181430Z  INFO influxdb3_catalog::object_store::versions::v2: catalog not found, creating a new one catalog_uuid=fc5be581-f5b7-4b8e-b718-fc8765bf18a7
2025-09-15T15:40:34.184156Z  INFO influxdb3_catalog::object_store::versions::v2: persisted catalog checkpoint file sequence=0
2025-09-15T15:40:34.184174Z  INFO influxdb3_catalog::catalog::versions::v2::update: create database name="_internal"
2025-09-15T15:40:34.184181Z  INFO influxdb3_catalog::catalog::versions::v2: creating new database database_name="_internal"
2025-09-15T15:40:34.184396Z  INFO influxdb3_catalog::object_store::versions::v2: persisted next catalog sequence put_result=PutResult { e_tag: Some("3ce2e5-63ed8d3010e91-ed"), version: None } object_path=CatalogFilePath(Path { raw: "node0/catalog/v2/logs/00000000000000000001.catalog" })
2025-09-15T15:40:34.184475Z  INFO influxdb3_catalog::catalog::versions::v2: created internal database
2025-09-15T15:40:34.184615Z  INFO influxdb3_lib::commands::serve: catalog initialized catalog_uuid=fc5be581-f5b7-4b8e-b718-fc8765bf18a7
2025-09-15T15:40:34.184691Z  INFO influxdb3_lib::commands::serve: Initializing table index cache node_id="node0" max_entries=Some(100) concurrency_limit=20
2025-09-15T15:40:34.184705Z  INFO influxdb3_write::table_index_cache: creating table indices from split snapshots
2025-09-15T15:40:34.184760Z  WARN influxdb3_write::table_index_cache: Retrying conversion marker load after error error=Object at location /var/lib/influxdb3/node0/table-index-conversion-completed not found: No such file or directory (os error 2) retry_after_ms=50 path=node0/table-index-conversion-completed
2025-09-15T15:40:34.236289Z  WARN influxdb3_write::table_index_cache: Retrying conversion marker load after error error=Object at location /var/lib/influxdb3/node0/table-index-conversion-completed not found: No such file or directory (os error 2) retry_after_ms=100 path=node0/table-index-conversion-completed

Influx DB is hanging on Retrying conversion marker load after error error=Object at location /var/lib/influxdb3/node0/table-index-conversion-completed not found: No such file or directory (os error 2) retry_after_ms=100 path=node0/table-index-conversion-completed ... ?

I'm thinking perhaps I need to add a mechanism that ensures that I wait for the container to be running etc ...

Any advise anyone can give on this would be brilliant ... what was working is no longer working and I am at a loss as to what I have done or not done that could have caused this.

Many thanks!

michealroberts avatar Sep 15 '25 16:09 michealroberts

@michealroberts - can you confirm that the file does not exist?


Additional context

The WARN is emitted here: https://github.com/influxdata/influxdb/blob/a6f8aab12c0b662d811b618d6d0df3780ecfce40/influxdb3_write/src/table_index_cache.rs#L392-L397

From what I can tell, the request to get that file is made with a retry config that tries twice, which looks to be the case based on the logs shared above.

However, that it hangs after the second WARN implies that either:

  1. the retry mechanism is hanging
  2. the runtime is hanging on something else

There are quite a few logs emitted from that function and NOT_FOUND is handled gracefully. This makes me think that (1.) is a possibility, but we'll need to reproduce to be sure.

hiltontj avatar Sep 16 '25 14:09 hiltontj

There are quite a few logs emitted from that function and NOT_FOUND is handled gracefully. This makes me think that (1.) is a possibility, but we'll need to reproduce to be sure.

I've attempted to reproduce hanging behavior in the backon crate with a minimal rust project. I saw it seem to happen one time, but with a looping script was not able to repeat it again so for now I'm assuming the first time I seemed to see it was some kind of fluke.

@michealroberts can you reproduce the issue while passing the -vvv flag to your influxdb3 command and share the resulting logs?

waynr avatar Sep 22 '25 19:09 waynr