data-engineer-handbook
data-engineer-handbook copied to clipboard
This is a repo with links to everything you'd ever want to learn about data engineering
In the `book.md` file, the hyperlink for "Learning Spark, 2nd Edition" currently redirects to the Databricks website. It should instead link to the actual book on Amazon. **Incorrect link**: [https://databricks.com](https://databricks.com)...
This commit addresses potential confusion around a folder being created in the root of the repo, which is not the case. Volumes are created from the given docker-compose.yml. After 'docker...
- Remove `edit` from section links for easy navigation - Previously these links prompted users without edit access to fork the repo rather than navigate within it - Slight wording...
## Issue **Note:** This is an issue in the DataExpert UI itself, not in this repository, but I wasn't sure where else to raise it. The `X` to exit the...
The `start_job.py` file uses a different method of setting the `sasl.jaas.config` property for the Kafka sink compared to the `aggregation_job.py` file. The former constructs the config string inline within the...
Body: The Spark application name is not consistent across all Spark jobs. `monthly_user_site_hits_job.py` and `players_scd_job.py` use "players_scd," while `team_vertex_job.py` also uses "players_scd". This can lead to confusion when monitoring or...
# Co-author Co-authored with @samlafell ## What - fixes a versioning issue with the mc entrypoint command in the docker file - Before, it was running an outdated `config` command...
The `statsig.initialize(API_KEY)` call in `server.py` does not have any error handling. If `API_KEY` is not set or is invalid, `statsig.initialize` could fail, causing the entire application to crash. A try-except...
Add 2 links in books.md and README.md - [Streaming Databases: Unifying Batch and Stream Processing](https://www.amazon.com/Streaming-Databases-Unifying-Stream-Processing/dp/1098154835) - [Timeplus](https://www.timeplus.com/)