ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Swarm queries fail with `Cannot find manifest file for data file` error

Open hodgesrm opened this issue 6 months ago • 3 comments

Describe the bug Swarm queries fail on to a problem accessing manifest data. The same query works on the initiator. This appears to be due to some sort of corruption of the Iceberg files on S3.

When it works, you get an error like the following on the initiator:

2025.07.08 00:59:36.621970 [ 84 ] {} <Error> TCPHandler: Code: 36. DB::Exception: Cannot find manifest file for data file: btc/transactions/data/1751086330717-0730f2009f6fc6e8111b14b0e7c0b5a83007226d1b1564f2231d822a4973f777.parquet: While executing IcebergS3(_table_function.icebergS3Cluster)Source. (BAD_ARGUMENTS)

The full log message shows the following underlying error:

2025.07.08 00:59:36.621679 [ 84 ] {0ab0bd55-38ce-4388-8e2c-fdc6fd3fff6b} <Error> executeQuery: Code: 36. DB::Exception: Cannot find manifest file for data file: btc/transactions/data/1751086330717-0730f2009f6fc6e8111b14b0e7c0b5a83007226d1b1564f2231d822a4973f777.parquet: While executing IcebergS3(_table_function.icebergS3Cluster)Source. (BAD_ARGUMENTS) (version 25.3.3.20139.altinityantalya.20139 (official build)) (from [::1]:39160) (query 1, line 1) (in query: SELECT __table1.date AS date, sum(__table1.output_count) AS `sum(output_count)` FROM icebergS3Cluster('swarm', 's3://rhodges-ice-rest-catalog-demo/btc/transactions', 'ASIAWKGGVET5C6U2HRQS', '[HIDDEN]', 'IQoJb3JpZ2luX2VjEHkaCXVzLXdlc3QtMiJGMEQCIHSYUkGfhq1ARjBiIfhvX4Y/DZedgctaJ8BdJrqrCe8ZAiB5u6r+lyLB3EO4h6QK/nlneKNM2lESbcY4wrtqGbhpACqKBQiC//////////8BEAIaDDQzNDIwODMxODcxNCIM68xWjXSta7SIcRwUKt4EqG3rsdM7x3vILvYeHhbu6j6HAM1kFVG2CXRSMuaZ08Qi+F0nUMMU1lmUnRhLUH3A6oGYb2AwvwFNuESfi1TwA7UIA+IemH9b6NVcX6awlVaB5B9o2yYqkQ9C6NNye2IZfvlMcckjfYZP//Qm7/oImkyajr2tTcpruKNwKnw0VGS61Iz4O2R7ka90u+bepMxaHBjGPc0n4gJ71ZcWduT/KZE7VV7+BlyKuANkCwy+CeuyvnM6yJrrBlZ+XP1KdX7Rx4NupQ3ddMUjjgOr1v3zjGlHB5ESpNz+r/48tUM+xBjRHFwUpoC4buN9MKudWrHqGMt3w8abq6hcRbpT2TFEZj/aUh2XMipR/32AoAlRlHDwVQxTmCziMYk5mICd0rKr3rgBMtbL8/Xj80UAIVewONbOHT+n/vx4//Tge1//ATJgjl6hip3vSOwXZcSq91PMXdw1KuLe8ndQs0alzj5lpdupjTf44oElGi1yy6BOvHFTL4R1AxZnhYVqsZFpiBcxOyoUWYGWuMNZysSptGq5+lb46iHe9a1zt8S4xUc45FtgQojSxtxYtyEYBLRhSAxEmc5erz6RZlr40W8DKq90NYd9K/yB4qOPntVncPHrBNdoH1tg6f9yigQSANzLSf76AjnJMXxSaD59P04TyQt4rbpuWs543HonzvMDINLHa3yJ0YlX+3d2kx3IzHEi3sibfyIeGVUt7vebruYLyFQ/hgSF1+JmKJqPN//4cxurn3X4QOJb+oVZDrb3ySXCo3jc3xRY0ITyYvLzz4Gu2XevoIsF2jMAcBWxs1etSol+MI7SscMGOpsB7nb7z5uSGF1pb5Ye1NZeZkWt3P6iPxBOci1W5MD1BcO+CaA5868xKOqAmINJx2qtke+ScGjjcysH9g5BIm+gDlnWC4zmUyf4oQV5B6Mvt0YCgumy/q+ncgqOtrkYbcTFctzpFTlWw/obCFS/1Y8sAPP2x1j/8X3OCYVk5G965xyeUS9ZxMKBflEeK54nsvOuxtSijk0dUUHOums=', 'Parquet', '`hash` Nullable(String), `version` Nullable(Int64), `size` Nullable(Int64), `block_hash` Nullable(String), `block_number` Nullable(Int64), `index` Nullable(Int64), `virtual_size` Nullable(Int64), `lock_time` Nullable(Int64), `input_count` Nullable(Int64), `output_count` Nullable(Int64), `is_coinbase` Nullable(Bool), `output_value` Nullable(Float64), `outputs` Array(Tuple(address Nullable(String), index Nullable(Int64), required_signatures Nullable(Int64), script_asm Nullable(String), script_hex Nullable(String), type Nullable(String), value Nullable(Float64))), `block_timestamp` Nullable(DateTime64(6, \'UTC\')), `date` Nullable(String), `last_modified` Nullable(DateTime64(6, \'UTC\')), `fee` Nullable(Float64), `input_value` Nullable(Float64), `inputs` Array(Tuple(address Nullable(String), index Nullable(Int64), required_signatures Nullable(Int64), script_asm Nullable(String), script_hex Nullable(String), sequence Nullable(Int64), spent_output_index Nullable(Int64), spent_transaction_hash Nullable(String), txinwitness Array(Nullable(String)), type Nullable(String), value Nullable(Float64)))') AS __table1 WHERE __table1.date = '2025-02-01' GROUP BY __table1.date ORDER BY __table1.date ASC SETTINGS use_hive_partitioning = 1, input_format_parquet_use_metadata_cache = 1, enable_filesystem_cache = 1, use_iceberg_metadata_files_cache = 1), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000f4d923b
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009f9df4c
2. DB::Exception::Exception<String&>(int, FormatStringHelperImpl<std::type_identity<String&>::type>, String&) @ 0x0000000009fb3feb
3. DB::IcebergMetadata::getSchemaVersionByFileIfOutdated(String) const @ 0x0000000011f273c5                             4. DB::IcebergMetadata::getInitialSchemaByPath(String const&) const @ 0x0000000011f2bb1f
4. 5. DB::DataLakeConfiguration<DB::StorageS3Configuration, DB::IcebergMetadata>::getInitialSchemaByPath(String const&) const @ 0x000000001134fad3
6. DB::StorageObjectStorageSource::createReader(unsigned long, std::shared_ptr<DB::IObjectIterator> const&, std::shared_ptr<DB::StorageObjectStorage::Configuration> const&, std::shared_ptr<DB::IObjectStorage> const&, DB::ReadFromFormatInfo&, std::optional<DB::FormatSettings> const&, std::shared_ptr<DB::KeyCondition const> const&, std::shared_ptr<DB::Context const> const&, DB::SchemaCache*, std::shared_ptr<Poco::Logger> const&, unsigned long, unsigned long, bool) @ 0x0000000011e8b5b0
7. DB::StorageObjectStorageSource::generate() @ 0x0000000011e89793
8. DB::ISource::tryGenerate() @ 0x0000000014d9731e
9. DB::ISource::work() @ 0x0000000014d96f27
10. DB::ExecutionThreadContext::executeTask() @ 0x0000000014db2b16
11. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x0000000014da6604
12. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl()::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x0000000014da8d3f
13. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x000000000f60f3db
14. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000f615f02
15. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x000000000f60c70f
16. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000000f6139da
17. ? @ 0x00007f7fc3f48ac3
18. ? @ 0x00007f7fc3fda850

To Reproduce Steps to reproduce the behavior on AWS EKS:

  1. Set up initiator and swarm servers using manifest files at https://github.com/Altinity/antalya-examples/tree/main/kubernetes/manifests
  2. Add ice catalog using scripts at https://github.com/Altinity/ice/tree/master/examples/eks
  3. Load tables using ice commands. Examples:
ice insert nyc.taxis -p https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet
ice insert btc.transactions -p --s3-no-sign-request --s3-region=us-east-2 s3://aws-public-blockchain/v1.0/btc/transactions/date=2025-0*/*/*.parquet
  1. (Do the above multiple times and ensure a few things fail so that files are left on S3 without being fully loaded.)
  2. Run a query that uses the swarm.
SELECT date,  sum(output_count)
FROM ice.`btc.transactions`
WHERE date = '2025-02-01'
GROUP BY date ORDER BY date ASC
SETTINGS use_hive_partitioning = 1, object_storage_cluster = 'swarm', input_format_parquet_use_metadata_cache = 1, enable_filesystem_cache = 1, use_iceberg_metadata_files_cache = 1

Expected behavior This query should return date and sum value. It does so on the initiator.

Screenshots On the swarm you get the error described above.

Key information Provide relevant runtime details.

  • Antalya build: 25.3.3.20139.altinityantalya.20139 (container)
  • Iceberg catalog: Ice 0.4.0
  • Environment: AWS EKS 1.3.2 and native S3 storage

Additional context You can "fix" the problem by deleting tables and cleaning off S3 storage fully, then reloading. E.g.:

ice delete-table foo.bar
ice delete-table foo.baz
 2010  aws s3 rm --recursive s3://rhodges-ice-rest-catalog-demo/

Another clue: once things are broken, they seem to break across multiple tables.

The current hypothesis is that there is an edge case where Iceberg files get corrupted in a way that queries working through S3 storage fail. However, I tried a few obvious repros and they didn't seem to help.

  1. Manually kill large uploads.
  2. Delete and reload tables in Ice catalog using ice delete-table without cleaning up extra files (e.g., from previous step).
  3. Reload the same parquet files after a failure.

hodgesrm avatar Jul 08 '25 14:07 hodgesrm

I got the same exception on initiator node when I have two tables under one namespace and select from the second table.

Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Received from clickhouse3:9000. DB::Exception: Cannot find manifest file for data file: data/data/symbol_partition=Eve/00000-1-0f785ef5-0a35-4bac-9003-2a29c41016c4.parquet: While executing IcebergS3(_table_function.icebergS3Cluster)Source. (BAD_ARGUMENTS) 
(query: SELECT * FROM datalake_760ea642_5cb8_11f0_8ec9_e12f95dcac7e.`iceberg_760ea641_5cb8_11f0_8ec9_e12f95dcac7e.table_2_760ea644_5cb8_11f0_8ec9_e12f95dcac7e` ORDER BY tuple() FORMAT Values)

I can create simply python script with repro.

alsugiliazova avatar Jul 09 '25 11:07 alsugiliazova

The same exception message appears in scenario with table recreation.

  1. Create Iceberg table and DataLakeCatalog database
  2. Add data
  3. Drop Iceberg table
  4. Recreate with same name
  5. Add new data
  6. Try to read from ClickHouse

I get on 6th step:

Code: 36. DB::Exception: Received from localhost:9000. DB::Exception: Received from clickhouse2:9000. DB::Exception: Cannot find manifest file for data file: data/data/symbol_partition=David/00000-0-258b884f-6763-4970-ba18-96928e8d9341.parquet: While executing IcebergS3(_table_function.icebergS3Cluster)Source. (BAD_ARGUMENTS)

alsugiliazova avatar Jul 09 '25 11:07 alsugiliazova

Also, getting the same exception on version 25.6.5.20363.altinityantalya, but in a bit different scenario: No exception when querying an iceberg table with a select * from iceberg(), but when trying to join another iceberg table to it, an exception is thrown:

2025.10.05 23:05:00.041609 [ 85 ] {} <Error> TCPHandler: Code: 36. DB::Exception: Cannot find manifest file for data file: tables/<fact_table>/data/as_of_date=2023-10-15/00015-198976-e06ac989-671b-4ff5-a42a-c21791e0a5bc-0-00001.parquet: While executing IcebergS3(_table_function.iceberg)ReadStep. (BAD_ARGUMENTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000fc10edb
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009db0eec
2. DB::Exception::Exception<String&>(int, FormatStringHelperImpl<std::type_identity<String&>::type>, String&) @ 0x0000000009dc6acb
3. DB::IcebergMetadata::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000012853724
4. DB::DataLakeConfiguration<DB::StorageS3Configuration, DB::IcebergMetadata>::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000011bb9745
5. DB::StorageIcebergConfiguration::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000011bb2dfe
6. DB::StorageObjectStorageSource::createReader(unsigned long, std::shared_ptr<DB::IObjectIterator> const&, std::shared_ptr<DB::StorageObjectStorage::Configuration> const&, std::shared_ptr<DB::IObjectStorage> const&, DB::ReadFromFormatInfo&, std::optional<DB::FormatSettings> const&, std::shared_ptr<DB::KeyCondition const> const&, std::shared_ptr<DB::Context const> const&, DB::SchemaCache*, std::shared_ptr<Poco::Logger> const&, unsigned long, unsigned long, bool) @ 0x000000001274a2b4
7. DB::StorageObjectStorageSource::generate() @ 0x0000000012748213
8. DB::ISource::tryGenerate() @ 0x00000000156b7cde
9. DB::ISource::work() @ 0x00000000156b78e7
10. DB::ExecutionThreadContext::executeTask() @ 0x00000000156d3ee2
11. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x00000000156c79e5
12. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x00000000156ca1df
13. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x000000000fd588cb
14. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000fd5f4dd
15. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x000000000fd55af2
16. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000000fd5cfba
17. ? @ 0x0000000000094ac3
18. ? @ 0x0000000000126850

Have no issue querying it with Spark or Athena, so unlikely anything in S3 is corrupted.

The tables are written using Spark. ClickHouse has read only permissions to S3. The following query is failing (note that iceberg_metadata_file_path is set explicitly to bypass Glue catalog):

select * from iceberg('<path_to_fact_table>', SETTINGS iceberg_metadata_file_path = 'metadata/24762-eaccbd6f-278c-498a-955b-a51b5d7cb4c2.metadata.json') sm
join iceberg('<path_to_join_table>', SETTINGS iceberg_metadata_file_path = 'metadata/05240-05e496d6-618a-4a4f-9846-a387d3f30d85.metadata.json') e on sm.id = e.id

We have 8 swarm nodes, the exception happens on a random one each time.

Settings are as follows and only "object_storage_cluster": "swarm" seems to be responsible for the exception.

        "object_storage_cluster": "swarm",
        "input_format_parquet_use_metadata_cache": "1",
        "enable_filesystem_cache": "1",
        "use_iceberg_metadata_files_cache": "1",
        "use_hive_partitioning": "1",
        "date_time_overflow_behavior": "saturate",
        "use_object_storage_list_objects_cache": "1",
        "use_iceberg_partition_pruning": "1",
        "max_memory_usage": str(10 * 1024 * 1024 * 1024),  # 10 GB
        "max_bytes_before_external_group_by": str(10 * 1024 * 1024 * 1024),  # 10 GB
        "max_bytes_before_external_sort": str(10 * 1024 * 1024 * 1024),  # 10 GB

timoha avatar Oct 05 '25 23:10 timoha