duckdb_azure icon indicating copy to clipboard operation
duckdb_azure copied to clipboard

HTTP stats over counting total_bytes_received?

Open mmaitre314 opened this issue 7 months ago • 2 comments

I am trying to optimize a query and noticed that the HTTP stats in EXPLAIN ANALYZE statements seem to be off. I query one Parquet file with 10.79 GiB and the HTTP stats mention reading 32.7 GiB. I am wondering whether http_state_policy.cpp could be over-counting total_bytes_received, and in particular including values from the content-length HTTP header of HEAD requests.

SET azure_transport_option_type = curl;
SET azure_http_stats = True;
SET threads = 1;
SET azure_read_transfer_concurrency = 1;
SET azure_read_transfer_chunk_size = 1024 * 1024;
SET azure_read_buffer_size = 1024 * 1024;

EXPLAIN ANALYZE SELECT col1 FROM 'az://<snip>.blob.core.windows.net/<snip>.parquet' LIMIT 1 
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││            HTTP Stats:            ││
││                                   ││
││            in: 32.7 GiB           ││
││            out: 0 bytes           ││
││              #HEAD: 3             ││
││             #GET: 354             ││
││              #PUT: 0              ││
││              #POST: 0             ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘

In the Azure SDK logs I see 3 HEAD requests with content-length : 11583653237 and 349 GET requests with content-length : 1048576. So the total input data should be around 0.34 GiB instead of 32.7 GiB.

If this analysis is correct, I can send a small PR to fix.

mmaitre314 avatar Jul 15 '24 12:07 mmaitre314