gpdb icon indicating copy to clipboard operation
gpdb copied to clipboard

(WIP) Add zstd compression support when data transfer

Open water32 opened this issue 2 years ago • 3 comments

I just changed the gpfdist.c file and completed the demo test with the execute external table: 1, compile and generate executable file 2, create a shell script file named [gcurl.sh], the content is:

#!/bin/bash
URLNUM=$#
if [ $URLNUM == 0 ];then
    echo "Please specify 1 url at least" >&2
    exit 1
fi
URLARY=()
while [ $# -gt 0 ]; do
    URLARY[${#URLARY[@]}]=$1
    shift
done
index=0
while [ $index -lt $URLNUM ];do
    rand1=$((RANDOM%URLNUM))
    rand2=$((RANDOM%URLNUM))
    if [ $rand1 != $rand2 ];then
        tmpurl=${URLARY[$rand1]}
        URLARY[$rand1]=${URLARY[$rand2]}
        URLARY[$rand2]=$tmpurl
    fi
    index=$((index+1))
done
if [ "$GP_SEGMENT_ID" == "-1" ];then
    GP_SEGMENT_ID=0
fi
HEADER="-H 'X-GP-XID: $GP_XID' -H 'X-GP-CID: $GP_CID' -H 'X-GP-SN: $GP_SN' -H 'X-GP-SEGMENT-ID: $GP_SEGMENT_ID' -H 'X-GP-SEGMENT-COUNT: $GP_SEGMENT_COUNT'"
HEADER="$HEADER -H 'X-GP-PROTO: 0' -H 'X-GP-MASTER_HOST: $GP_MASTER_HOST' -H 'X-GP-MASTER_PORT: $GP_MASTER_PORT' -H 'X-GP-DATABASE: $GP_DATABASE'"
HEADER="$HEADER -H 'X-GP-USER: $GP_USER' -H 'X-GP-SEG-PORT: $GP_SEG_PORT' -H 'X-GP-SESSION-ID: $GP_SESSION_ID' -H 'X-GP-ZSTD: 1'"
set -o pipefail
for CURLURL in ${URLARY[@]};do
    echo curl --silent $HEADER $CURLURL|bash|unzstd
done

3, copy this file gcurl.sh to all segment host's /tmp path 4, on host mdw generate a big file /dev/shm/test.txt 5, start gpfdist server 6, create an external table to test performance:

drop external table if exists ext_testexcu;
CREATE READABLE EXTERNAL WEB TABLE ext_testexcu
(
  a text
)
 EXECUTE E'sh /tmp/gcurl.sh mdw:8080/dev/shm/test.txt' 
 FORMAT 'text' (delimiter 'off' null E'\\N' escape E'\\');
select count(*) from ext_testexcu;

In 2020, I completed the demo test on gss China's dca 1.0 device, and the performance can reach about 2.5GB/s, and the network used about 380MB/s. And I wrote a report in Chinese when I completed this demo: https://github.com/water32/gpfaq/blob/master/2020/gpfdist.md

I hope R&D can take my improvements and complete the rest development.

water32 avatar Aug 23 '22 13:08 water32

Super job!!!!!

From my side:

  1. we need more commments
  2. we need to add test cases
  3. some code refinement still need.

But the test is super! Thanks a lot!

kainwen avatar Aug 23 '22 13:08 kainwen

I'm sorry for missing some information: 1, I installed libzstd-devel in my environment 2, I modified the Makefile to ensure that the executable file can be run independently: LDLIBS += $(LIBS) $(GPFDIST_LIBS) $(apr_link_ld_libs) => LDLIBS += "/usr/local/lib/libzstd.a" $(LIBS) $(GPFDIST_LIBS) $(apr_link_ld_libs)

-- edited by Adam, apply markdown syntax to put codes into code blocks.

water32 avatar Aug 23 '22 14:08 water32

Cool! It would be even better if dev could make the gpfdist:// and gpfdists:// of readable and writable external tables support decompression and compression natively.

adam8157 avatar Aug 25 '22 03:08 adam8157