gpdb
gpdb copied to clipboard
(WIP) Add zstd compression support when data transfer
I just changed the gpfdist.c file and completed the demo test with the execute external table: 1, compile and generate executable file 2, create a shell script file named [gcurl.sh], the content is:
#!/bin/bash
URLNUM=$#
if [ $URLNUM == 0 ];then
echo "Please specify 1 url at least" >&2
exit 1
fi
URLARY=()
while [ $# -gt 0 ]; do
URLARY[${#URLARY[@]}]=$1
shift
done
index=0
while [ $index -lt $URLNUM ];do
rand1=$((RANDOM%URLNUM))
rand2=$((RANDOM%URLNUM))
if [ $rand1 != $rand2 ];then
tmpurl=${URLARY[$rand1]}
URLARY[$rand1]=${URLARY[$rand2]}
URLARY[$rand2]=$tmpurl
fi
index=$((index+1))
done
if [ "$GP_SEGMENT_ID" == "-1" ];then
GP_SEGMENT_ID=0
fi
HEADER="-H 'X-GP-XID: $GP_XID' -H 'X-GP-CID: $GP_CID' -H 'X-GP-SN: $GP_SN' -H 'X-GP-SEGMENT-ID: $GP_SEGMENT_ID' -H 'X-GP-SEGMENT-COUNT: $GP_SEGMENT_COUNT'"
HEADER="$HEADER -H 'X-GP-PROTO: 0' -H 'X-GP-MASTER_HOST: $GP_MASTER_HOST' -H 'X-GP-MASTER_PORT: $GP_MASTER_PORT' -H 'X-GP-DATABASE: $GP_DATABASE'"
HEADER="$HEADER -H 'X-GP-USER: $GP_USER' -H 'X-GP-SEG-PORT: $GP_SEG_PORT' -H 'X-GP-SESSION-ID: $GP_SESSION_ID' -H 'X-GP-ZSTD: 1'"
set -o pipefail
for CURLURL in ${URLARY[@]};do
echo curl --silent $HEADER $CURLURL|bash|unzstd
done
3, copy this file gcurl.sh to all segment host's /tmp path 4, on host mdw generate a big file /dev/shm/test.txt 5, start gpfdist server 6, create an external table to test performance:
drop external table if exists ext_testexcu;
CREATE READABLE EXTERNAL WEB TABLE ext_testexcu
(
a text
)
EXECUTE E'sh /tmp/gcurl.sh mdw:8080/dev/shm/test.txt'
FORMAT 'text' (delimiter 'off' null E'\\N' escape E'\\');
select count(*) from ext_testexcu;
In 2020, I completed the demo test on gss China's dca 1.0 device, and the performance can reach about 2.5GB/s, and the network used about 380MB/s. And I wrote a report in Chinese when I completed this demo: https://github.com/water32/gpfaq/blob/master/2020/gpfdist.md
I hope R&D can take my improvements and complete the rest development.
Super job!!!!!
From my side:
- we need more commments
- we need to add test cases
- some code refinement still need.
But the test is super! Thanks a lot!
I'm sorry for missing some information:
1, I installed libzstd-devel in my environment
2, I modified the Makefile to ensure that the executable file can be run independently:
LDLIBS += $(LIBS) $(GPFDIST_LIBS) $(apr_link_ld_libs)
=>
LDLIBS += "/usr/local/lib/libzstd.a" $(LIBS) $(GPFDIST_LIBS) $(apr_link_ld_libs)
-- edited by Adam, apply markdown syntax to put codes into code blocks.
Cool! It would be even better if dev could make the gpfdist://
and gpfdists://
of readable and writable external tables support decompression and compression natively.