data-engineer-handbook icon indicating copy to clipboard operation
data-engineer-handbook copied to clipboard

This is a repo with links to everything you'd ever want to learn about data engineering

Results 129 data-engineer-handbook issues
Sort by recently updated
recently updated
newest added

Clarify submission format for homework in README

**Problem :** mc config host add minio http://minio:9000/ admin password" => it throws an config is not a recognized command. (in the container logs) **Solution** : Using alias instead of...

## Java Issue - In the beginning of the Unit Testing Spark Jobs lab, you show that students are ready to begin when 3 `pytest` tests pass - However, these...

The `aggregation_job.py` and `start_job.py` files use slightly different methods to pass kafka credentials to the flink connector. `start_job.py` creates the `sasl_config` variable, which is formatted in a manner that is...

The `/tasks` route creates a new StatsigUser object every time `get_experiment` is called, which can lead to inconsistent experiment assignment for a single user within the same request. The StatsigUser...

The `print(sink_ddl)` statement in the `create_processed_events_sink_kafka` function within `start_job.py` is likely for debugging purposes and should be removed from production code.

Add additional steps complementing the existing Dimensional Data Modeling guide for setting up Postgres in Docker, including locating credentials to log in to PGAdmin, noting browser quirks, and walking through...

I suggest this minor change since the old link for the Data Engineer Things Community, doesn't work. So I looked for the right link and replaced it.

In week 3, the Spark test `test_monthly_site_hits` fails due to an empty array. This fix uses the `get()` to tolerate accessing element at invalid index and return NULL instead **Before**...