splittablegzip icon indicating copy to clipboard operation
splittablegzip copied to clipboard

Guide/readme/example for using with AWS Glue ETL job

Open leeprevost opened this issue 11 months ago • 12 comments

I wonder if you could make suggestions on how to use this in an AWS glue job. My method does not involve using spark-submit but rather creating job definitions and run-job using boto3 tools.

When I try to use this in my script, i get: pyspark.sql.utils.IllegalArgumentException: Compression codec nl.basjes.hadoop.io.compress.SplittableGzipCodec not found.

have tried passing --conf nl.basjes.hadoop.io.compress.SplittableGzipCodec, -packages nl.basjes.hadoop.io.compress.SplittableGzipCodec and other methods as args to job to no avail. I think I must need to put a copy of the codec on s3 and point to it with extra-files or other arg?

leeprevost avatar Feb 28 '24 14:02 leeprevost