bitsail icon indicating copy to clipboard operation
bitsail copied to clipboard

[BitSail][Connector] Support ElasticSearch Source connector.

Open BlockLiu opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe

Support ElasticSearch reader.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

BlockLiu avatar Nov 09 '22 02:11 BlockLiu

I want to try this, please assign to me, thx.

liuxiaocs7 avatar Jan 16 '23 01:01 liuxiaocs7

Nice, please take your time :D

BlockLiu avatar Jan 16 '23 06:01 BlockLiu

In PR #336

I use scroll api to implement paging query.

Each index is now considered a split, may be we coule use the slice parameter to break it down later, just like this link

The job conf looks like this:

{
  "job": {
    "reader": {
      "class": "com.bytedance.bitsail.connector.elasticsearch.source.ElasticsearchSource",
      "es_hosts": ["http://localhost:1234"],
      "es_index": "test1, test2, test3",
      "scroll_size": 3,
      "scroll_time": "1m",
      "columns": [
        {
          "index": 0,
          "name": "id",
          "type": "integer"
        },
        {
          "index": 1,
          "name": "text_type",
          "type": "text"
        },
        {
          "index": 2,
          "name": "keyword_type",
          "type": "keyword"
        },
        {
          "index": 3,
          "name": "long_type",
          "type": "long"
        },
        {
          "index": 4,
          "name": "date_type",
          "type": "date"
        }
      ]
    }
  }
}

@BlockLiu do you think it's ok, thx.

liuxiaocs7 avatar Jan 17 '23 07:01 liuxiaocs7

In PR #336

I use scroll api to implement paging query.

Each index is now considered a split, may be we coule use the slice parameter to break it down later, just like this link

The job conf looks like this:

{
  "job": {
    "reader": {
      "class": "com.bytedance.bitsail.connector.elasticsearch.source.ElasticsearchSource",
      "es_hosts": ["http://localhost:1234"],
      "es_index": "test1, test2, test3",
      "scroll_size": 3,
      "scroll_time": "1m",
      "columns": [
        {
          "index": 0,
          "name": "id",
          "type": "integer"
        },
        {
          "index": 1,
          "name": "text_type",
          "type": "text"
        },
        {
          "index": 2,
          "name": "keyword_type",
          "type": "keyword"
        },
        {
          "index": 3,
          "name": "long_type",
          "type": "long"
        },
        {
          "index": 4,
          "name": "date_type",
          "type": "date"
        }
      ]
    }
  }
}

@BlockLiu do you think it's ok, thx.

I think it's a good idea. And from the note below, I think we can get the shard count at first and then build slices. 截屏2023-01-29 11 48 26

BlockLiu avatar Jan 29 '23 03:01 BlockLiu

Namely, we can use shard count as slice number.

BlockLiu avatar Jan 29 '23 03:01 BlockLiu

Namely, we can use shard count as slice number.

Thank you for your suggestion, I will continue to complete

liuxiaocs7 avatar Feb 01 '23 11:02 liuxiaocs7