data-engineer-handbook icon indicating copy to clipboard operation
data-engineer-handbook copied to clipboard

Java required for Spark unit testing lab and homework

Open Ho1yShif opened this issue 5 months ago • 0 comments

Java Issue

  • In the beginning of the Unit Testing Spark Jobs lab, you show that students are ready to begin when 3 pytest tests pass
  • However, these tests all fail without a Java environment
  • Would it be possible to include a Java environment as part of the setup requirements for the Spark module? If not, I can update the README with this info too

Bug fix

  • Even after Java is installed, there appears to be an error in the do_monthly_user_site_hits_transformation function in bootcamp/materials/3-spark-fundamentals/src/jobs/monthly_user_site_hits_job.py. These SUM COALESCE statements need to include get() functions; otherwise they error
    SELECT
           month_start,
           SUM(COALESCE(get(hit_array, 0), 0)) as num_hits_first_day,
           SUM(COALESCE(get(hit_array, 1), 0)) AS num_hits_second_day,
           SUM(COALESCE(get(hit_array, 2), 0)) as num_hits_third_day
    FROM monthly_user_site_hits

Ho1yShif avatar Jun 22 '25 21:06 Ho1yShif