crate-python icon indicating copy to clipboard operation
crate-python copied to clipboard

Support HTTP compression

Open amotl opened this issue 10 months ago • 11 comments

About

CrateDB’s HTTP interface supports gzip and deflate compressed requests, but the crate-python client currently does not utilize this capability. Adding request compression would reduce bandwidth usage, improve performance for large queries and bulk inserts, and align crate-python with best practices seen in other database clients.

As a user, I want the option to send compressed requests to CrateDB to improve performance on congested networks.

Requirements:

  • [ ] Add a configuration option to enable request compression (gzip or deflate) when sending requests to CrateDB.
  • [ ] The default should enable compression
  • [ ] TBD: Introduce a size threshold to determine when compression is applied. Context: Sending a Content-Encoding header for every request adds unnecessary overhead, so compression should only be used when the request size exceeds a configurable threshold (e.g., 1 KB, 2 KB, or 4 KB, similar to other libraries).

[!warning] This is primarily about request encoding / compression. HTTP response encoding is vulnerable to BREACH and therefore requires additional measurements.


@proddata said:

It seems like CrateDB's HTTP interface accepts gzip / deflate compressed data. It might also be interesting to add this capability to crate-python.

@surister said:

import gzip
import json
import requests

objects = [
    [1, "test"] for _ in range(200_000)
]

body = {
    "stmt": "INSERT INTO t VALUES (?, ?)",
    "bulk_args": objects
}
response = requests.post('http://192.168.88.251:4200/_sql', json=body)


print(response.request.headers.get('content-length'))

response = requests.post('http://192.168.88.251:4200/_sql',
                         data=gzip.compress(json.dumps(body).encode('utf8')),
                         headers={'Content-Encoding': 'gzip',
                                  'Content-Type': 'application/gzip; charset=utf-8'})

print(response.request.headers.get('content-length'))
2600054
5149

References

  • https://github.com/crate/crate/pull/17494

amotl avatar Feb 21 '25 19:02 amotl