Description

Currently the project is only meant to be run locally, implement a solution to deploy this to the cloud.

Areas to cover

Platform to use
CI/CD
Env variable management (w Secret managements)
Cloud storage & processing.
Scheduling & Orchestration.
Logging, metadata & debugging

Choice of platform

Please leave a comment with your choices for areas 1 - 5 above.

Sep 01 '23 14:09 josephmachado

Hi Joseph

Thanks for sharing your project.

In case of next area to cover, I would love to see the expansion of this project to orchestration and/or platform. Otherwise, how about some talk on open table format?

Sep 09 '23 14:09 andreale28

All the topics mentioned above are interesting to see.

Sep 11 '23 15:09 abdelhaqs

Great innitiative @josephmachado! I would love to see more about points 2,3&5. :)

Oct 22 '23 03:10 flaviassantos

I would like to further explore 4&5 those two provide the best value for me as a data engineer. I would be glad to support if you can help with those two! Great job and thanks for the website, appreciate it.

Nov 13 '23 18:11 cjj1120

Great Project. Thank you.

However, It would be great, if you could teach us how to create such projects from scratch.

How to create those containers so that we can ship projects from dev to QA to PROD ?

ex : The underlying script for these commands make up # Spin up containers make ddl # Create tables & views make ci # Run checks & tests make etl # Run etl make spark-sh # Spark shell to check created tables

Dec 05 '23 19:12 kottargiveer

@kottargiveer The underlying commands are all under the Makefile, if that's what you're asking.

Feb 29 '24 03:02 cjj1120

My top 3 choices: 4 -> 5 -> 2

Feb 29 '24 03:02 cjj1120

Great project, I would be interested in all the topics mentioned above. Thank you very much!

Apr 05 '24 02:04 GEJ1

data_engineering_best_practices
data_engineering_best_practices copied to clipboard

Build cloud infrastructure

Description

Areas to cover

Choice of platform

data_engineering_best_practices data_engineering_best_practices copied to clipboard

Build cloud infrastructure

Description

Areas to cover

Choice of platform

data_engineering_best_practices
data_engineering_best_practices copied to clipboard