data-engineer-handbook issues

fix: update markdown for correct actor_films primary key fields

This commit corrects the homework markdown mentioning the fields used as the primary key in the actor_films table. The document references actor_id and film_id instead of actorid and filmid.

thirionjwf

Issue 1: Missing Exception Handling in server.py on Statsig Initialization

The server.py file initializes Statsig with `statsig.initialize(API_KEY)`. If the `API_KEY` is invalid or Statsig initialization fails for any reason, it will crash the application. It needs exception handling.

PrinceSajjadHussain

Fix: Incorrect table name in `do_team_vertex_transformation`'s `write` operation

In the `team_vertex_job.py` file, the `main` function incorrectly attempts to write the output DataFrame into a table named `"players_scd"`, which is likely meant for the players SCD job. This will...

PrinceSajjadHussain

Fix: Inconsistent table names in Flink Kafka Sinks.

The `start_job.py` file creates a Kafka sink named `process_events_kafka`, while `aggregation_job.py` creates a Kafka source with the same name `process_events_kafka`. This can cause confusion and potentially lead to the aggregation...

PrinceSajjadHussain

Add Dingo: A Comprehensive AI Data Quality Evaluation Tool

1

e06084

Update README.md

add Data Engineering Whitepapers https://www.ssp.sh/brain/data-engineering-whitepapers/ on Data Engineering Whitepapers

gocodeelite

Course lesson typo in module 3: Scala Spark vs. Scholar Spark

## Issue Wanted to call out that this says Scholar Spark instead of Scala Spark ![Image](https://github.com/user-attachments/assets/2add4968-11db-436a-bf5c-78be93d65f71)

Ho1yShif

Update ReadMe of `1-dimensional-data-modeling`

## Tables Not loaded using Docker 1. Copy your .dump file into the container `docker cp .\data.dump my-postgres-container:/tmp/data.dump` 2. Run pg_restore inside the container `docker exec -it my-postgres-container pg_restore -U...

VinothKanna007

Issue 1: Incorrect usage of `map` in `server.py` `get_tasks` route.

The map function call `filtered_tasks = ''.join(map(lambda a: ...)` in the `/tasks` route of `server.py` is creating a string by joining a list of strings, which is correct, but it...

PrinceSajjadHussain

Issue : Inconsistent Hash Usage for User Identification

In the `server.py` file, the signup and task routes generate user IDs using `hash(hash_string)`. The hash function is not guaranteed to produce the same hash value across different Python processes...

PrinceSajjadHussain

data-engineer-handbook
data-engineer-handbook copied to clipboard

Metadata

fix: update markdown for correct actor_films primary key fields

Issue 1: Missing Exception Handling in server.py on Statsig Initialization

Fix: Incorrect table name in `do_team_vertex_transformation`'s `write` operation

Fix: Inconsistent table names in Flink Kafka Sinks.

Add Dingo: A Comprehensive AI Data Quality Evaluation Tool

Update README.md

Course lesson typo in module 3: Scala Spark vs. Scholar Spark

Update ReadMe of `1-dimensional-data-modeling`

Issue 1: Incorrect usage of `map` in `server.py` `get_tasks` route.

Issue : Inconsistent Hash Usage for User Identification

← Metadata

Owner

Metadata

data-engineer-handbook data-engineer-handbook copied to clipboard

Metadata

← Metadata

Owner

Metadata

data-engineer-handbook
data-engineer-handbook copied to clipboard