gpdb
gpdb copied to clipboard
add_stream_compress_for_gpdb7
Gpfdist is a high-performance ETL tool to load external data for gpdb. In practice, while gpfdist can extract the full performance of the network card, gpfdist can also affect other processes on the same host.
Especially in cloud services, multiple services may share one physical network card. If Gpfdist exclusively uses the network card, it will lead to the failure of other services.
If both high transmission efficiency and low network usage are required, transferring the data after compression is a reasonable approach. Thus this code change is about using stream compression based on zstd algorithm to transfer data from gpfdist to gpdb. The code change involves the following aspects:
The switch flag(ec) to turn on/off compression transmission for gpfdist.
The modification of the HTTP header for requests and responses for interaction between gpdb and gpfdist.
The modification of compression data(for gpfdist end) and decompressing data(for gpdb end) streamingly.
Finally, to use the zstd compression to transfer data, you can add the specific flag --ec when starting up gpfdist. For example, typing gpfdist -d /home/gpadmin -p 7070 --ec to start gpfdist, and then data would be compressed from gpfdist to gpdb.