drill icon indicating copy to clipboard operation
drill copied to clipboard

DRILL-8259: Supports advanced HBase persistence storage options

Open luocooong opened this issue 3 years ago • 8 comments

DRILL-8259: Supports advanced HBase persistence storage options

Description

Maximize performance with HBase as persistent storage.

Documentation

Example in drill-override.conf

sys.store.provider: {
  class: "org.apache.drill.exec.store.hbase.config.HBasePStoreProvider",
  hbase: {
    table : "drill_store",
    config: {
      "hbase.zookeeper.quorum": "zk_host3,zk_host2,zk_host1",
      "hbase.zookeeper.property.clientPort": "2181",
      "zookeeper.znode.parent": "/hbase-test"
    },
    table_config : {
      "durability": "ASYNC_WAL",
      "compaction_enabled": false,
      "split_enabled": false,
      "max_filesize": 10737418240,
      "memstore_flushsize": 536870912
    },
    column_config : {
      "versions": 1,
      "ttl": 2626560,
      "compression": "SNAPPY",
      "blockcache": true,
      "blocksize": 131072,
      "data_block_encoding": "FAST_DIFF",
      "in_memory": true,
      "dfs_replication": 3
    }
  }
}

Configuration requirements

Key Type Value Example Reference
durability String ASYNC_WAL / SYNC_WAL / SKIP_WAL Durability
compaction_enabled Boolean false COMPACTION_ENABLED
split_enabled Boolean false SPLIT_ENABLED
max_filesize Number 10737418240 MAX_FILESIZE
memstore_flushsize Number 536870912 MEMSTORE_FLUSHSIZE
versions Number 1 MAX_VERSIONS
ttl Number 2626560 TTL
compression String SNAPPY / LZ4 Compression$Algorithm
blockcache Boolean true BLOCKCACHE
blocksize Number 131072 BLOCKSIZE
data_block_encoding String FAST_DIFF / PREFIX DataBlockEncoding
in_memory Boolean true IN_MEMORY
dfs_replication Number 3 DFS_REPLICATION

Testing

Added the TestHBaseTableProvider#testStoreTableAttributes()

luocooong avatar Jul 13 '22 12:07 luocooong

@luocooong Thanks for submitting this. I was wondering, is there a reason why we are storing these variables in drill-override.conf instead of the configuration for the storage plugin? IMHO, it is better to put it in the config so that you don't have to restart Drill any time you make a config change

cgivre avatar Jul 22 '22 21:07 cgivre

@cgivre Hi, Thank you for the questions. Actually, Drill PStore' variables are split from the storage configurations, because the goal is to define the initial variables before the Drill startup. And then, it has a different lifecycle from the storage configuration, so it is not recommended to be placed in the storage plugin.

luocooong avatar Jul 22 '22 21:07 luocooong

Why compaction_enabled": false? I thought compaction is important for hbase to boost performance?

Z0ltrix avatar Jul 23 '22 04:07 Z0ltrix

Why compaction_enabled": false? I thought compaction is important for hbase to boost performance?

As you know, HBase is a nightmare for operational services due to the complexity of the settings. The actual value in the above example is not a recommended value, no unique value is appropriate for every case, but is simply the type of value that this parameter has to fill, is "true/false", not "0/1".

And, would you mind helping me append this updated document to the drill-site?

luocooong avatar Jul 23 '22 06:07 luocooong

Why compaction_enabled": false? I thought compaction is important for hbase to boost performance?

As you know, HBase is a nightmare for operational services due to the complexity of the settings. The actual value in the above example is not a recommended value, no unique value is appropriate for every case, but is simply the type of value that this parameter has to fill, is "true/false", not "0/1".

And, would you mind helping me append this updated document to the drill-site?

Of course :)

Z0ltrix avatar Jul 23 '22 06:07 Z0ltrix

@cgivre @Z0ltrix I added two new options. If namespace is used, the namespace:table semantics are applied. And we can also use the table configuration only. The family is also an optional.

luocooong avatar Jul 26 '22 10:07 luocooong

@Z0ltrix Would you mind doing a formal review on this PR? @luocooong asked me but I don't really have enough experience with HBase to comment intelligently on this. If you're already happy with this, all you have to do is leave a +1.

cgivre avatar Aug 12 '22 02:08 cgivre

@Z0ltrix Would you mind doing a formal review on this PR? @luocooong asked me but I don't really have enough experience with HBase to comment intelligently on this. If you're already happy with this, all you have to do is leave a +1.

sorry for the late response, i would love to do the review :)

Z0ltrix avatar Aug 31 '22 07:08 Z0ltrix