duckdb_azure
duckdb_azure copied to clipboard
HTTP stats over counting total_bytes_received?
I am trying to optimize a query and noticed that the HTTP stats in EXPLAIN ANALYZE
statements seem to be off. I query one Parquet file with 10.79 GiB and the HTTP stats mention reading 32.7 GiB. I am wondering whether http_state_policy.cpp could be over-counting total_bytes_received
, and in particular including values from the content-length
HTTP header of HEAD requests.
SET azure_transport_option_type = curl;
SET azure_http_stats = True;
SET threads = 1;
SET azure_read_transfer_concurrency = 1;
SET azure_read_transfer_chunk_size = 1024 * 1024;
SET azure_read_buffer_size = 1024 * 1024;
EXPLAIN ANALYZE SELECT col1 FROM 'az://<snip>.blob.core.windows.net/<snip>.parquet' LIMIT 1
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ HTTP Stats: ││
││ ││
││ in: 32.7 GiB ││
││ out: 0 bytes ││
││ #HEAD: 3 ││
││ #GET: 354 ││
││ #PUT: 0 ││
││ #POST: 0 ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
In the Azure SDK logs I see 3 HEAD requests with content-length : 11583653237
and 349 GET requests with content-length : 1048576
. So the total input data should be around 0.34 GiB instead of 32.7 GiB.
If this analysis is correct, I can send a small PR to fix.