docker_datalake
docker_datalake copied to clipboard
Datalake
Airflow is one of the most important service in the datalake architecture. There is a lot of work to do in. Airflow handles all the workflows / pipelines for data...
For automation of the architecture deployment, one of the biggest work is too deploy Openstack services (here Swift and Keystone). A good way to deploy services that are maintainable, scalable...
# Dropzone file - [x] Create the component for a file drop zone - [x] Connect to the raw data area : MongoDB metadata insertion - [x] Connect to the...
Monitoring of any network or system is a crucial service for sustainable, maintenable and evolutive architecture. As it is a complex architecture, several level of monitoring are needed : -...
The central authentication system is the main security service for data security. The tool choosed is Openstack Keystone for several reason : - API are available in Python (and other...
The goals of this project is to implements the software tool to determine if services in the architectures are online. It has also to test if features are available and...
**Implementation of SGE data integration for Ms SQL server 2017** ( Linked to "In progress task" project board : SGE Data integration tasks) TODO : - [x] Set a MsSQL...
**Add a new field in metadata document : processed_data_area** Needed to know in which service (database mainly) the data has to be inserted in. - [x] Modify insertion script -...