ByConity
ByConity copied to clipboard
Is there a problem of duplicate downloads in S3 segmented downloads?
Question
Is there a problem of duplicate downloads in S3 segmented downloads?
When I was checking the logs of the default-worker, I found that the S3 version downloaded the same file multiple times in segments, and this data was duplicated, which may cause 2-3 copies of the same file to be pulled. I think this is unreasonable, please check it.
Logs of vw-default-1 node:
2024.05.16 13:28:08.198688 [ 525 ] {} <Debug> AWSClient: Make request to: http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/dbcd24f5-94ea-4d3c-0efb-fd1c61af8d96/data
2024.05.16 13:28:08.232677 [ 525 ] {} <Debug> AWSClient: Response status: 206, Partial Content
2024.05.16 13:28:08.232709 [ 525 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 05:28:08 GMT; Content-Type: application/xml; Content-Length: 18482198404; Connection: keep-
alive; Accept-Ranges: bytes; Content-Range: bytes 3254060092-21736258495/21736258496; ETag: "743898dd8e61224c8842cd916bf60150-1220"; Last-Modified: Sat, 04 May 2024 16:30:35 GMT; Strict-Transport-Security: max-a
ge=31536000; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: 54285e0ecff7a52155e28b256ef91a6942aba64f5275133a9915e5a03a2b0fe3; X-Amz-Request-Id: 17CFE0EC38B207BE; X-Content-Type-Options: nosn
iff; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449534533240356868;
2024.05.16 13:28:08.360277 [ 457 ] {} <Debug> AWSClient: AWS S3 slow read(100ms): http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/dbcd24f5-94ea-4d3c-0efb-fd1c61af8d96/data, ti
me = 20051ms, header = Server: nginx/1.21.5; Date: Thu, 16 May 2024 05:28:08 GMT; Content-Type: application/xml; Content-Length: 62896; Connection: keep-alive; Accept-Ranges: bytes; Content-Range: bytes 29185007
1-291912966/21736258496; ETag: "743898dd8e61224c8842cd916bf60150-1220"; Last-Modified: Sat, 04 May 2024 16:30:35 GMT; Strict-Transport-Security: max-age=31536000; includeSubDomains; Vary: Origin; Vary: Accept-En
coding; X-Amz-Id-2: 941e76dd7d6fa756cdff7ccf88d4481bffbe769d1028d5b43704b4c55c73ddfa; X-Amz-Request-Id: 17CFE0E797717C07; X-Content-Type-Options: nosniff; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449
534533240356868;
.16 13:28:08.360913 [ 457 ] {} <Debug> AWSClient: Make request to: http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/dbcd24f5-94ea-4d3c-0efb-fd1c61af8d96/data
2024.05.16 13:28:08.391827 [ 457 ] {} <Debug> AWSClient: Response status: 206, Partial Content
2024.05.16 13:28:08.391853 [ 457 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 05:28:08 GMT; Content-Type: application/xml; Content-Length: 21724365923; Connection: keep-
alive; Accept-Ranges: bytes; Content-Range: bytes 11892573-21736258495/21736258496; ETag: "743898dd8e61224c8842cd916bf60150-1220"; Last-Modified: Sat, 04 May 2024 16:30:35 GMT; Strict-Transport-Security: max-age
=31536000; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: d4ff7959db658b9e0dffd743dd392e33bc300cd3bbd7a5bd3d29810f53c3c9c8; X-Amz-Request-Id: 17CFE0EC4277F777; X-Content-Type-Options: nosnif
f; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449534533240356868;
2024.05.16 13:28:09.732969 [ 502 ] {} <Debug> AWSClient: AWS S3 slow read(100ms): http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/dbcd24f5-94ea-4d3c-0efb-fd1c61af8d96/data, ti
me = 20032ms, header = Server: nginx/1.21.5; Date: Thu, 16 May 2024 05:28:09 GMT; Content-Type: application/xml; Content-Length: 62896; Connection: keep-alive; Accept-Ranges: bytes; Content-Range: bytes 11623284
-11686179/21736258496; ETag: "743898dd8e61224c8842cd916bf60150-1220"; Last-Modified: Sat, 04 May 2024 16:30:35 GMT; Strict-Transport-Security: max-age=31536000; includeSubDomains; Vary: Origin; Vary: Accept-Enco
ding; X-Amz-Id-2: f26f03c24c5ec8801cb87012b4a127bedc053def9caa5a112cf1f83805ebe38e; X-Amz-Request-Id: 17CFE0E7EA1D555F; X-Content-Type-Options: nosniff; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#44953
4533240356868;
2024.05.16 13:28:09.733863 [ 502 ] {} <Debug> AWSClient: Make request to: http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/dbcd24f5-94ea-4d3c-0efb-fd1c61af8d96/data
2024.05.16 13:28:09.814513 [ 502 ] {} <Debug> AWSClient: Response status: 206, Partial Content
2024.05.16 13:28:09.814544 [ 502 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 05:28:09 GMT; Content-Type: application/xml; Content-Length: 21727900351; Connection: keep-
alive; Accept-Ranges: bytes; Content-Range: bytes 8358145-21736258495/21736258496; ETag: "743898dd8e61224c8842cd916bf60150-1220"; Last-Modified: Sat, 04 May 2024 16:30:35 GMT; Strict-Transport-Security: max-age=
31536000; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: 1b73111f7edd0bc74bca9ee528d43c48f91ab0a501a53129569d2af3a2fa2865; X-Amz-Request-Id: 17CFE0EC9475C151; X-Content-Type-Options: nosniff
; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449534533240356868;
Logs of vw-default-2 node:
2024.05.16 16:17:47.217211 [ 493 ] {} <Debug> AWSClient: Make request to: http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/d530f076-cda9-d83c-1900-af39694d000c/data
2024.05.16 16:17:47.239594 [ 493 ] {} <Debug> AWSClient: Response status: 206, Partial Content
2024.05.16 16:17:47.239661 [ 493 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 08:17:47 GMT; Content-Type: application/xml; Content-Length: 44640; Connection: keep-alive;
Accept-Ranges: bytes; Content-Range: bytes 209293991-209338630/15460208929; ETag: "30e2a9132c1d83500dcc2f8535b825ce-868"; Last-Modified: Sun, 05 May 2024 14:16:20 GMT; Strict-Transport-Security: max-age=3153600
0; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: 6b55756232eaebad29d4d8b397bf9c659f243dca8d7ab18592690d0cc4c34798; X-Amz-Request-Id: 17CFEA2E327FEC1A; X-Content-Type-Options: nosniff; X-Xss
-Protection: 1; mode=block; x-amz-meta-pg-id: T#449554659452387339;
2024.05.16 16:17:47.240946 [ 493 ] {} <Debug> DiskLocal: Reserving 192.41 MiB on disk `server_local_0`, having unreserved 1.34 TiB.
2024.05.16 16:17:47.241188 [ 493 ] {} <Debug> AWSClient: Make request to: http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/d530f076-cda9-d83c-1900-af39694d000c/data
2024.05.16 16:17:47.255143 [ 493 ] {} <Debug> AWSClient: Response status: 206, Partial Content
2024.05.16 16:17:47.255166 [ 493 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 08:17:47 GMT; Content-Type: application/xml; Content-Length: 15452667054; Connection: keep-
alive; Accept-Ranges: bytes; Content-Range: bytes 7541875-15460208928/15460208929; ETag: "30e2a9132c1d83500dcc2f8535b825ce-868"; Last-Modified: Sun, 05 May 2024 14:16:20 GMT; Strict-Transport-Security: max-age=3
1536000; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: 941e76dd7d6fa756cdff7ccf88d4481bffbe769d1028d5b43704b4c55c73ddfa; X-Amz-Request-Id: 17CFEA2E339618C0; X-Content-Type-Options: nosniff;
X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449554659452387339;
2024.05.16 16:17:47.335909 [ 470 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 08:17:47 GMT; Content-Type: application/xml; Content-Length: 44640; Connection: keep-alive; Accept-Ranges: bytes; Content-Range: bytes 7497235-7541874/15460208929; ETag: "30e2a9132c1d83500dcc2f8535b825ce-868"; Last-Modified: Sun, 05 May 2024 14:16:20 GMT; Strict-Transport-Security: max-age=31536000; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: 1b73111f7edd0bc74bca9ee528d43c48f91ab0a501a53129569d2af3a2fa2865; X-Amz-Request-Id: 17CFEA2E384E4EDD; X-Content-Type-Options: nosniff; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449554659452387339;
2024.05.16 16:17:47.336322 [ 470 ] {} <Debug> DiskLocal: Reserving 99.45 KiB on disk `server_local_0`, having unreserved 1.34 TiB.
2024.05.16 16:17:47.336536 [ 470 ] {} <Debug> AWSClient: Make request to: http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/d530f076-cda9-d83c-1900-af39694d000c/data
2024.05.16 16:17:47.354031 [ 470 ] {} <Debug> AWSClient: Response status: 206, Partial Content
2024.05.16 16:17:47.354055 [ 470 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 08:17:47 GMT; Content-Type: application/xml; Content-Length: 15452813534; Connection: keep-alive; Accept-Ranges: bytes; Content-Range: bytes 7395395-15460208928/15460208929; ETag: "30e2a9132c1d83500dcc2f8535b825ce-868"; Last-Modified: Sun, 05 May 2024 14:16:20 GMT; Strict-Transport-Security: max-age=31536000; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: 080f919c38a146b0c9e87ef5ccef4e7f57727f513de461329f04b2a946d47397; X-Amz-Request-Id: 17CFEA2E3951B7FB; X-Content-Type-Options: nosniff; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449554659452387339;
2024.05.16 16:17:47.360179 [ 470 ] {} <Debug> DiskLocal: Reserving 43.59 KiB on disk `server_local_0`, having unreserved 1.34 TiB.
2024.05.16 16:17:47.372596 [ 490 ] {} <Debug> AWSClient: Make request to: http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/d530f076-cda9-d83c-1900-af39694d000c/data
2024.05.16 16:17:47.392572 [ 490 ] {} <Debug> AWSClient: Response status: 206, Partial Content
2024.05.16 16:17:47.392605 [ 490 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 08:17:47 GMT; Content-Type: application/xml; Content-Length: 44640; Connection: keep-alive; Accept-Ranges: bytes; Content-Range: bytes 7350755-7395394/15460208929; ETag: "30e2a9132c1d83500dcc2f8535b825ce-868"; Last-Modified: Sun, 05 May 2024 14:16:20 GMT; Strict-Transport-Security: max-age=31536000; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: 080f919c38a146b0c9e87ef5ccef4e7f57727f513de461329f04b2a946d47397; X-Amz-Request-Id: 17CFEA2E3B908F6F; X-Content-Type-Options: nosniff; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449554659452387339;
2024.05.16 16:17:47.393722 [ 490 ] {} <Debug> DiskLocal: Reserving 2.22 MiB on disk `server_local_0`, having unreserved 1.34 TiB.
2024.05.16 16:17:47.393941 [ 490 ] {} <Debug> AWSClient: Make request to: http://minio-nginx-svc.minio-nginx.svc.cluster.local/bigdata-olap-data/pandora_data/d530f076-cda9-d83c-1900-af39694d000c/data
2024.05.16 16:17:47.419049 [ 490 ] {} <Debug> AWSClient: Response status: 206, Partial Content
2024.05.16 16:17:47.419073 [ 490 ] {} <Debug> AWSClient: Received headers: Server: nginx/1.21.5; Date: Thu, 16 May 2024 08:17:47 GMT; Content-Type: application/xml; Content-Length: 15455181897; Connection: keep-alive; Accept-Ranges: bytes; Content-Range: bytes 5027032-15460208928/15460208929; ETag: "30e2a9132c1d83500dcc2f8535b825ce-868"; Last-Modified: Sun, 05 May 2024 14:16:20 GMT; Strict-Transport-Security: max-age=31536000; includeSubDomains; Vary: Origin; Vary: Accept-Encoding; X-Amz-Id-2: a6df522adb4071567f8fb40dc8952caed1cdaa48348601461d84f5ba69b473c1; X-Amz-Request-Id: 17CFEA2E3CDA29CB; X-Content-Type-Options: nosniff; X-Xss-Protection: 1; mode=block; x-amz-meta-pg-id: T#449554659452387339;
Extract the content of Content Length and Content Range:
vw-default-1 node:
Content-Length: 18482198404; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 3254060092-21736258495/21736258496;
Content-Length: 62896; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 291850071-291912966/21736258496
Content-Length: 21724365923; Connection: keep-
alive; Accept-Ranges: bytes;
Content-Range: bytes 11892573-21736258495/21736258496;
Content-Length: 62896; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 11623284-11686179/21736258496;
Content-Length: 21727900351; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 8358145-21736258495/21736258496;
vw-default-2 node:
Content-Length: 44640; Connection: keep-alive;Accept-Ranges: bytes;
Content-Range: bytes 209293991-209338630/15460208929;
Content-Length: 15452667054; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 7541875-15460208928/15460208929;
Content-Length: 44640; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 7497235-7541874/15460208929;
Content-Length: 15452813534; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 7395395-15460208928/15460208929;
Content-Length: 44640; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 7350755-7395394/15460208929;
Content-Length: 15455181897; Connection: keep-alive; Accept-Ranges: bytes;
Content-Range: bytes 5027032-15460208928/15460208929;
It can be seen that there are overlapping parts when reading multiple segments of the same file, and this has a significant impact on S3 bandwidth. Is it reasonable to use segmentation to read data segments exceeding 10G?