[IOTDB-2569]Support ZSTD Compression
Description
https://issues.apache.org/jira/browse/IOTDB-2569
ZSTD is a new compression method published by Facebook. https://github.com/facebook/zstd
use pure java ZSTD: https://github.com/luben/zstd-jni The compression ratio is about 0.83
Eg: IoTDB > CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE, COMPRESSOR=ZSTD
Content1 ...
Content2 ...
Content3 ...
This PR has:
- [ ] been self-reviewed.
- [ ] concurrent read
- [ ] concurrent write
- [ ] concurrent read and write
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods.
- [ ] added or updated version, license, or notice information
- [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.
- [ ] added integration tests.
- [ ] been tested in a test IoTDB cluster.
Key changed/added classes (or packages if there are too many classes) in this PR
ZSTD对写入性能有些影响(见数据)。
ZSTD对写入性能有些影响(见数据)。
您好,,麻烦请问这个测试方法是怎样的呢?或者有测试脚本吗?我也测一下,看看怎么优化
2台8C32G 机器,1台安装iotdb,1台benchmark: iotdb参数配置: MAX_HEAP_SIZE="16G" wal_buffer_size_in_byte=1048576 default_storage_group_level=2
compressor=ZSTD和SNAPPY对比,ZSTD对写入性能影响大。 benchmark参数配置: DEVICE_NUMBER=2000 SENSOR_NUMBER=1000 CLIENT_NUMBER=100 GROUP_NUMBER=50 OPERATION_PROPORTION=70:1:1:1:1:0:1:1:1:1:1 BATCH_SIZE_PER_WRITE=10 IS_OUT_OF_ORDER=false LOOP=1000000 LINE_RATIO=0 SIN_RATIO=0 SQUARE_RATIO=0 RANDOM_RATIO=1 CONSTANT_RATIO=0
可以试试这个配置,ZSTD对写入性能影响更大,压缩比和SNAPPY差不多。
您好,,麻烦请问这个测试方法是怎样的呢?或者有测试脚本吗?我也测一下,看看怎么优化
2台8C32G 机器,1台安装iotdb,1台benchmark: iotdb参数配置: MAX_HEAP_SIZE="16G" wal_buffer_size_in_byte=1048576 default_storage_group_level=2
compressor=ZSTD和SNAPPY对比,ZSTD对写入性能影响大。 benchmark参数配置: DEVICE_NUMBER=2000 SENSOR_NUMBER=1000 CLIENT_NUMBER=100 GROUP_NUMBER=50 OPERATION_PROPORTION=70:1:1:1:1:0:1:1:1:1:1 BATCH_SIZE_PER_WRITE=10 IS_OUT_OF_ORDER=false LOOP=1000000 LINE_RATIO=0 SIN_RATIO=0 SQUARE_RATIO=0 RANDOM_RATIO=1 CONSTANT_RATIO=0
可以试试这个配置,ZSTD对写入性能影响更大,压缩比和SNAPPY差不多。
您好,,麻烦请问这个测试方法是怎样的呢?或者有测试脚本吗?我也测一下,看看怎么优化 好的,感谢感谢,我看看
运行36小时,iotdb进程消失,生成hs_err_pid文件 hs_err_pid17566.log 对写入性能影响大,此压缩方式的写入性能 约为snappy的10分之1
2台8C32G 机器,1台安装iotdb,1台benchmark: iotdb参数配置: MAX_HEAP_SIZE="16G" wal_buffer_size_in_byte=1048576 default_storage_group_level=2 compressor=ZSTD benchmark参数配置: DEVICE_NUMBER=2000 SENSOR_NUMBER=1000 CLIENT_NUMBER=100 GROUP_NUMBER=50 OPERATION_PROPORTION=70:1:1:1:1:0:1:1:1:1:1 BATCH_SIZE_PER_WRITE=10 IS_OUT_OF_ORDER=false LOOP=1000000 LINE_RATIO=0 SIN_RATIO=0 SQUARE_RATIO=0 RANDOM_RATIO=1 CONSTANT_RATIO=0
Hi, I see the latest version is v1.5.2-3 now, why don't you use it? And it may fix the bug that @liuzhen1207 mentioned.
https://github.com/luben/zstd-jni/releases/tag/v1.5.2-3
Another question I would like to know, this there any reason that you choose the jni implement rather than the pure Java version. https://github.com/airlift/aircompressor/tree/master/src/main/java/io/airlift/compress/zstd
Airlift said their version is actully 10-40% faster than the JNI one. See https://github.com/airlift/aircompressor
Another question I would like to know, this there any reason that you choose the jni implement rather than the pure Java version. https://github.com/airlift/aircompressor/tree/master/src/main/java/io/airlift/compress/zstd
Airlift said their version is actully 10-40% faster than the JNI one. See https://github.com/airlift/aircompressor
I Reference apache pulsar, it used zstd-jni. I'm not familiar with Airlift, and do not see license discribe in there github, so I'm not sure if Airlift has copyright issues. If want to try Airlift, I rewrite mycode.
Hi, I see the latest version is v1.5.2-3 now, why don't you use it? And it may fix the bug that @liuzhen1207 mentioned.
https://github.com/luben/zstd-jni/releases/tag/v1.5.2-3
sorry, I don't see their version clearly, so I don't chose their newest version. I also think we should choose newest version too.
Another question I would like to know, this there any reason that you choose the jni implement rather than the pure Java version. https://github.com/airlift/aircompressor/tree/master/src/main/java/io/airlift/compress/zstd Airlift said their version is actully 10-40% faster than the JNI one. See https://github.com/airlift/aircompressor
I Reference apache pulsar, it used zstd-jni. I'm not familiar with Airlift, and do not see license discribe in there github, so I'm not sure if Airlift has copyright issues. If want to try Airlift, I rewrite mycode.
Airlift has actually already introduced in IoTDB server module now, and the license is Apache 2.0. I think we can try it in a new PR to compare the performance. :)
Another question I would like to know, this there any reason that you choose the jni implement rather than the pure Java version. https://github.com/airlift/aircompressor/tree/master/src/main/java/io/airlift/compress/zstd Airlift said their version is actully 10-40% faster than the JNI one. See https://github.com/airlift/aircompressor
I Reference apache pulsar, it used zstd-jni. I'm not familiar with Airlift, and do not see license discribe in there github, so I'm not sure if Airlift has copyright issues. If want to try Airlift, I rewrite mycode.
Airlift has actually already introduced in IoTDB server module now, and the license is Apache 2.0. I think we can try it in a new PR to compare the performance. :)
ok
Another question I would like to know, this there any reason that you choose the jni implement rather than the pure Java version. https://github.com/airlift/aircompressor/tree/master/src/main/java/io/airlift/compress/zstd Airlift said their version is actully 10-40% faster than the JNI one. See https://github.com/airlift/aircompressor
I Reference apache pulsar, it used zstd-jni. I'm not familiar with Airlift, and do not see license discribe in there github, so I'm not sure if Airlift has copyright issues. If want to try Airlift, I rewrite mycode.
Airlift has actually already introduced in IoTDB server module now, and the license is Apache 2.0. I think we can try it in a new PR to compare the performance. :)
new PR is here, thanks for check: https://github.com/apache/iotdb/pull/6729