pyspark-example-project icon indicating copy to clipboard operation
pyspark-example-project copied to clipboard

Implementing best practices for PySpark ETL jobs and applications.

Results 12 pyspark-example-project issues
Sort by recently updated
recently updated
newest added

First of all, thanks for the great work! I am new to spark and this repo has really helped me getting started. I am trying to get my etl job...

Bumps [pyspark](https://github.com/apache/spark) from 2.4.0 to 3.1.3. Commits d1f8a50 Preparing Spark release v3.1.3-rc4 7540421 Preparing development version 3.1.4-SNAPSHOT b8c0799 Preparing Spark release v3.1.3-rc3 0a7eda3 [SPARK-38075][SQL][3.1] Fix hasNext in HiveScriptTransformationExec's pro... 91db9a3...

dependencies

Bumps [ipython](https://github.com/ipython/ipython) from 7.2.0 to 7.16.3. Commits d43c7c7 release 7.16.3 5fa1e40 Merge pull request from GHSA-pq7m-3gw7-gq5x 8df8971 back to dev 9f477b7 release 7.16.2 138f266 bring back release helper from master...

dependencies

Hi seems like `etl_config.json` is not accessible when running in yarn client mode. Could you please me to investigate this issue?

ERROR: test_transform_data (tests.test_etl_job.SparkETLTests) Test data transformer. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/brix/pyspark-workloads/pyspark-example-project/tests/test_etl_job.py", line 27, in setUp self.spark, *_ = start_spark() File "/home/brix/pyspark-workloads/pyspark-example-project/dependencies/spark.py", line 88, in start_spark spark_sess =...

Please add the LICENSE file to the repo so that one is sure of how to use it in closed-source codebases. [See here](https://choosealicense.com/no-permission/)

Bumps [pygments](https://github.com/pygments/pygments) from 2.3.1 to 2.7.4. Release notes Sourced from pygments's releases. 2.7.4 Updated lexers: Apache configurations: Improve handling of malformed tags (#1656) CSS: Add support for variables (#1633, #1666)...

dependencies

Could you add functionality to pass Job level parameters. Pass via parameters file maybe?

File "/home/ashish/Downloads/pyspark-example-project-master/jobs/etl_job.py", line 57, in main data_transformed = transform_data(data, config['steps_per_floor']) TypeError: 'NoneType' object is not subscriptable

https://github.com/AlexIoannides/pyspark-example-project/blob/13d6fb2f5fb45135499dbd1bc3f1bdac5b8451db/tests/test_etl_job.py#L64 You should use `data_transformed `not `expected_data` for actual transformation output.