ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Stack trace when using Antalya build against Glue server behind REST proxy.

Open hodgesrm opened this issue 8 months ago • 2 comments

Describe the bug

(Opened from description provided by @Daesgar )

I have setup a rest proxy for Glue catalog to use the Iceberg database engine and use the same table, but, instead of using the icebergS3Cluster function, I wanted to ideally query the table directly through the Iceberg database to see if it makes any difference. I have setup the database like this:

ENGINE = Iceberg('http://iceberg-catalog-glue.iceberg.svc:8181/')
SETTINGS catalog_type = 'rest', storage_endpoint = 's3://posthog-iceberg-854902948032-us-east-1/', warehouse = 'posthog_iceberg'

It seems to work fine (apparently) as I can list the tables and issue a show create table and they respond fine. However, when I try to perform a query, I get the following (maybe you have faced something similar):

SELECT count()
FROM datalake.`posthog_iceberg.events_no_write_objectstorage`
WHERE team_id = 2
SETTINGS use_hive_partitioning = 1, object_storage_cluster = 'swarm', input_format_parquet_use_metadata_cache = 1, enable_filesystem_cache = 1, use_iceberg_metadata_files_cache = 1

Query id: bfcb884f-841f-4178-98ed-d6963c10c53e

Elapsed: 1.795 sec.

Received exception from server (version 25.2.2):
Code: 49. DB::Exception: Received from localhost:9000. DB::Exception: Context has expired: while optimizing query plan:
Expression ((Projection + Before ORDER BY))
Header: count() UInt64
Actions: INPUT :: 0 -> count() UInt64 : 0
Positions: 0
  MergingAggregated
  Header: count() UInt64
  Keys:
  Aggregates:
      count()
        Function: count() → UInt64
        Arguments: none
    ObjectFilter (WHERE)
    Header: count() AggregateFunction(count)
      ReadFromCluster
      Header: count() AggregateFunction(count)
. (LOGICAL_ERROR)
The stack trace from the server is:
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000ba5b2e8
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x00000000077b3d9c
2. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x00000000077bf294
3. DB::WithContextImpl<std::shared_ptr<DB::Context const>>::getContext() const @ 0x00000000077befa4
4. DB::IcebergMetadata::iterate(DB::ActionsDAG const*, std::function<void (DB::FileProgress)>, unsigned long) const @ 0x000000000e088574
5. DB::DataLakeConfiguration<DB::StorageS3Configuration, DB::IcebergMetadata>::iterate(DB::ActionsDAG const*, std::function<void (DB::FileProgress)>, unsigned long) @ 0x000000000d62b5a4
6. DB::StorageObjectStorageSource::createFileIterator(std::shared_ptr<DB::StorageObjectStorage::Configuration>, DB::StorageObjectStorage::QuerySettings const&, std::shared_ptr<DB::IObjectStorage>, bool, std::shared_ptr<DB::Context const> const&, DB::ActionsDAG::Node const*, std::optional<DB::ActionsDAG> const&, DB::NamesAndTypesList const&, std::vector<std::shared_ptr<DB::RelativePathWithMetadata>, std::allocator<std::shared_ptr<DB::RelativePathWithMetadata>>>*, std::function<void (DB::FileProgress)>) @ 0x000000000e000d64
7. DB::StorageObjectStorageCluster::getTaskIteratorExtension(DB::ActionsDAG::Node const*, std::shared_ptr<DB::Context const> const&, std::optional<std::vector<String, std::allocator<String>>>) const @ 0x000000000dfa80a8
8. DB::ReadFromCluster::createExtension(DB::ActionsDAG::Node const*) @ 0x000000000fc53d94
9. DB::ReadFromCluster::applyFilters(DB::ActionDAGNodes) @ 0x000000000fc53c10
10. DB::QueryPlanOptimizations::optimizePrimaryKeyConditionAndLimit(std::vector<DB::QueryPlanOptimizations::Frame, std::allocator<DB::QueryPlanOptimizations::Frame>> const&) @ 0x0000000010d668d8
11. DB::QueryPlanOptimizations::optimizeTreeSecondPass(DB::QueryPlanOptimizationSettings const&, DB::QueryPlan::Node&, std::list<DB::QueryPlan::Node, std::allocator<DB::QueryPlan::Node>>&) @ 0x0000000010d655d0
12. DB::QueryPlan::optimize(DB::QueryPlanOptimizationSettings const&) @ 0x0000000010cd2fc0
13. DB::QueryPlan::buildQueryPipeline(DB::QueryPlanOptimizationSettings const&, DB::BuildQueryPipelineSettings const&) @ 0x0000000010cd22f4
14. DB::InterpreterSelectWithUnionQuery::execute() @ 0x000000000f49b818
15. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum, DB::ReadBuffer*, std::shared_ptr<DB::IAST>&) @ 0x000000000f75cfdc
16. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, DB::QueryFlags, DB::QueryProcessingStage::Enum) @ 0x000000000f7595d4
17. DB::TCPHandler::runImpl() @ 0x0000000010858ac4
18. DB::TCPHandler::run() @ 0x0000000010873f28
19. Poco::Net::TCPServerConnection::start() @ 0x00000000138415b8
20. Poco::Net::TCPServerDispatcher::run() @ 0x0000000013841ad4
21. Poco::PooledThread::run() @ 0x000000001380d03c
22. Poco::ThreadImpl::runnableEntry(void*) @ 0x000000001380b410
23. ? @ 0x000000000007d5b8
24. ? @ 0x00000000000e5ed

The pods are setup with permissions to read from Glue, apparently is not a permission issue. And those are my findings for now, sorry for the wall of text but wanted to share as many details as possible! A clear and concise description of what the bug is.

To Reproduce See above.

Expected behavior Should not crash.

Key information Provide relevant runtime details.

  • Project Antalya Build Version: 25.2.2.27772.altinityantalya
  • Cloud provider: AWS
  • Kubernetes provider: Not relevant
  • Iceberg catalog: AWS Glue with REST Proxy

Additional context Add any other context about the problem here.

hodgesrm avatar Apr 26 '25 19:04 hodgesrm

@Daesgar some questions:

  1. can you describe the REST proxy and Glue configuration?
  2. Is this running against S3 directly or using an S3 table bucket?

It's possible this will be solved by improved Glue support in 25.3 but I'm noting the issue here to ensure we have it marked for QA certification. The additional information will help us with certification.

hodgesrm avatar Apr 26 '25 19:04 hodgesrm

@Daesgar some questions:

1. can you describe the REST proxy and Glue configuration?

2. Is this running against S3 directly or using an S3 table bucket?
  1. We are using this image for the REST proxy: https://github.com/databricks/iceberg-rest-image

    The pod we are running the proxy on has privileges to read from the Glue catalog:

    ...
        "glue:GetTables",
        "glue:GetTable",
        "glue:GetDatabases",
        "glue:GetDatabase"
    ...
    

    The only configuration that the REST proxy has is that it uses a role with the above privileges (and some more), and the environment variable to tell it to use Glue as the backend:

    CATALOG_CATALOG__IMPL= org.apache.iceberg.aws.glue.GlueCatalog
    
  2. We are using S3 directly

Daesgar avatar Apr 28 '25 08:04 Daesgar