dig-etl-engine Starting up myDIG for the very first time

Starting up myDIG for the very first time

Open saggu opened this issue 7 years ago • 3 comments

Create hbase table `dataset_view`, if it does not exist.

This will be used for TLD View in mydig frontend.

Schema:

rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>

Note : We are not going to track desired number of docs. It'll only exist as front end concept and based on what the user has entered, that many documents will be fetched from hbase and processed.

Aug 17 '18 21:08 saggu

Implemented

Aug 31 '18 18:08 saggu

Adding desired to hbase as well. The updated schema looks like this

Schema:

rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
desired: number of desired docs in elasticsearch for <dataset> in <project_name>

Sep 04 '18 23:09 saggu

We need to track total docs added to kafka while etk processing, updated schema:

Schema:

rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
desired: number of desired docs in elasticsearch for <dataset> in <project_name>
added_docs: total number of docs added to kakfa for <dataset> in <project_name>

Sep 05 '18 01:09 saggu

dig-etl-engine dig-etl-engine copied to clipboard

Starting up myDIG for the very first time

Create hbase table dataset_view, if it does not exist.

Schema:

Schema:

Schema:

dig-etl-engine
dig-etl-engine copied to clipboard

Create hbase table `dataset_view`, if it does not exist.