dig-etl-engine icon indicating copy to clipboard operation
dig-etl-engine copied to clipboard

Starting up myDIG for the very first time

Open saggu opened this issue 7 years ago • 3 comments

Create hbase table dataset_view, if it does not exist.

This will be used for TLD View in mydig frontend.

Schema:

rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>

Note : We are not going to track desired number of docs. It'll only exist as front end concept and based on what the user has entered, that many documents will be fetched from hbase and processed.

saggu avatar Aug 17 '18 21:08 saggu

Implemented

saggu avatar Aug 31 '18 18:08 saggu

Adding desired to hbase as well. The updated schema looks like this

Schema:

rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
desired: number of desired docs in elasticsearch for <dataset> in <project_name>

saggu avatar Sep 04 '18 23:09 saggu

We need to track total docs added to kafka while etk processing, updated schema:

Schema:

rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
desired: number of desired docs in elasticsearch for <dataset> in <project_name>
added_docs: total number of docs added to kakfa for <dataset> in <project_name>

saggu avatar Sep 05 '18 01:09 saggu