data_engineering_best_practices
data_engineering_best_practices copied to clipboard
Build cloud infrastructure
Description
Currently the project is only meant to be run locally, implement a solution to deploy this to the cloud.
Areas to cover
- Platform to use
- CI/CD
- Env variable management (w Secret managements)
- Cloud storage & processing.
- Scheduling & Orchestration.
- Logging, metadata & debugging
Choice of platform
Please leave a comment with your choices for areas 1 - 5 above.
Hi Joseph
Thanks for sharing your project.
In case of next area to cover, I would love to see the expansion of this project to orchestration and/or platform. Otherwise, how about some talk on open table format?
All the topics mentioned above are interesting to see.
Great innitiative @josephmachado! I would love to see more about points 2,3&5. :)
I would like to further explore 4&5 those two provide the best value for me as a data engineer. I would be glad to support if you can help with those two! Great job and thanks for the website, appreciate it.
Great Project. Thank you.
However, It would be great, if you could teach us how to create such projects from scratch.
How to create those containers so that we can ship projects from dev to QA to PROD ?
ex : The underlying script for these commands make up # Spin up containers make ddl # Create tables & views make ci # Run checks & tests make etl # Run etl make spark-sh # Spark shell to check created tables
@kottargiveer The underlying commands are all under the Makefile
, if that's what you're asking.
My top 3 choices: 4 -> 5 -> 2
Great project, I would be interested in all the topics mentioned above. Thank you very much!