dig-etl-engine
dig-etl-engine copied to clipboard
Starting up myDIG for the very first time
Create hbase table dataset_view, if it does not exist.
This will be used for TLD View in mydig frontend.
Schema:
rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
Note : We are not going to track desired number of docs. It'll only exist as front end concept and based on what the user has entered, that many documents will be fetched from hbase and processed.
Implemented
Adding desired to hbase as well. The updated schema looks like this
Schema:
rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
desired: number of desired docs in elasticsearch for <dataset> in <project_name>
We need to track total docs added to kafka while etk processing, updated schema:
Schema:
rowid: design- <project_name>_<dataset>. Allows us to quickly fetch the total number of documents in each project and for each dataset
total_docs: number of documents in <dataset> in <project_name>
desired: number of desired docs in elasticsearch for <dataset> in <project_name>
added_docs: total number of docs added to kakfa for <dataset> in <project_name>