dcache icon indicating copy to clipboard operation
dcache copied to clipboard

Verification of compression Day

Open cfgamboa opened this issue 3 years ago • 5 comments
trafficstars

Hello,

I would like to verify if compression is enabled by default for webdav traffic. If so what type of compression algorithms are supported? Here it does not seem that is enabled on the DAV door by default.

All the best, Carlos

curl -vvv -H "Accept-Encoding: gzip" -vvv --location --capath /etc/grid-security/certificates --cacert $X509_USER_PROXY --cert $X509_USER_PROXY --key $X509_USER_PROXY  https://dcdoor02.usatlas.bnl.gov/pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc15_14TeV/51/9d/AOD.19622723._022059.pool.root.1 --output /tmp/test.curl 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 10.42.38.58...
* TCP_NODELAY set
* Connected to dcdoor02.usatlas.bnl.gov (10.42.38.58) port 443 (#0)
* ALPN, offering http/1.1
* Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH
* successfully set certificate verify locations:
*   CAfile: /tmp/x509up_u9102
  CApath: /etc/grid-security/certificates
* TLSv1.2 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS handshake, Server hello (2):
{ [85 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [1765 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [333 bytes data]
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
{ [8214 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
} [6858 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [70 bytes data]
* TLSv1.2 (OUT), TLS handshake, CERT verify (15):
} [264 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Client hello (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
*  subject: DC=org; DC=incommon; C=US; ST=New York; L=Upton; O=Brookhaven National Laboratory; OU=SDCC; CN=dcdoor02.usatlas.bnl.gov
*  start date: Jun 21 00:00:00 2021 GMT
*  expire date: Jul 21 23:59:59 2022 GMT
*  subjectAltName: host "dcdoor02.usatlas.bnl.gov" matched cert's "dcdoor02.usatlas.bnl.gov"
*  issuer: C=US; O=Internet2; OU=InCommon; CN=InCommon IGTF Server CA
*  SSL certificate verify ok.
} [5 bytes data]
> GET /pnfs/usatlas.bnl.gov/BNLT0D1/rucio/mc15_14TeV/51/9d/AOD.19622723._022059.pool.root.1 HTTP/1.1
> Host: dcdoor02.usatlas.bnl.gov
> User-Agent: curl/7.61.0
> Accept: */*
> Accept-Encoding: gzip
> 
{ [5 bytes data]
< HTTP/1.1 200 OK
< Date: Fri, 25 Mar 2022 20:39:30 GMT
< X-Content-Type-Options: nosniff
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< Server: dCache/7.2.7
< Content-Security-Policy: default-src 'none' ; img-src 'self' data: ; style-src 'self' 'unsafe-inline' ; script-src 'self'; font-src 'self'
< Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
< Referrer-Policy: strict-origin-when-cross-origin
< Accept-Ranges: bytes
< ETag: "0000CD8379BE4588428BBCE90E388BF70936_1466704753"
< Last-Modified: Sun, 10 Nov 2019 22:25:34 GMT
< Cache-Control: no-cache
< Content-Disposition: attachment
< Content-Length: 512002778
< 
{ [15738 bytes data]

cfgamboa avatar Mar 28 '22 15:03 cfgamboa

Currently the pool does not support sending compressed output.

I've heard that scientific data formats (such as ROOT) do not compress well, which means that enabling compression does not help much.

That said, I believe that the reason compression isn't supported is mostly because nobody has asked for it.

paulmillar avatar Mar 28 '22 20:03 paulmillar

Thank very much Paul for your prompt feedback

cfgamboa avatar Mar 31 '22 12:03 cfgamboa

@cfgamboa ,

What kind of data do you imagine users storing on dCache?

Perhaps you could do a test to see whether gzip is able to compress these files (by any significant amount).

Cheers, Paul.

paulmillar avatar Jul 11 '22 12:07 paulmillar

@paulmillar you mean compresing at storage or during transfer?

cfgamboa avatar Oct 20 '22 17:10 cfgamboa

I mean at the network level, but the argument is somewhat similar for both storage and network/transport.

Quite a few years ago, I was at a filesystem mini-conference where HPC facilities were showing how compression helps. For them, a computing node's IO bandwidth (when writing to disk) was a limiting factor, so compressing the data meant they could make better use of that bandwidth. There's a CPU cost, but (overall) compressing the data before writing it to disk was a performance benefit.

The equivalent would be a pool compressing data when writing to the disk. However, I'm not sure the benefits would translate to the use-cases that dCache deals with: I think most files will not compress much. It's (perhaps) an interesting area for future work.

For network transports, the same kind of reasoning exists. We don't support compressing data because (I believe) most files are already compressed or do not compress much (ROOT files, for example).

That said, I believe adding support for network compress (gzip, say, as in your example) would not be hard: this is a standard feature in HTTP and we are using industry standard frameworks to support HTTP. So, it probably just boils down to "nobody has asked for it".

paulmillar avatar Oct 21 '22 08:10 paulmillar