hadoop icon indicating copy to clipboard operation
hadoop copied to clipboard

Branch 3.2 - hadoop per bucket endpoint configuration is ignored

Open einavh opened this issue 2 years ago • 0 comments

I'm using EMR emr-6.5.0 cluster in us-east-1 with ec2 instances. cluster is running spark application using pyspark 3.2.1 EMR is using Hadoop distribution:Amazon 3.2.1

my spark application is reading from one bucket in us-west-2 and writing to a bucket in us-east-1.

since I'm processing a large amount of data I'm paying a lot of money for the network transport . in order to reduce the cost I have create a vpc interface to s3 endpoint in us-west-2. inside the spark application I'm using aws cli for reading the file names from us-west-2 bucket and it is working through the s3 interface endpoint but when I use pyspark to read the data it is using the us-east-1 s3 endpoint instead of the us-west-2 endpoint. I tried to use per bucket configuration but it is being ignored although I added it to the defualt configuration and to spark submit call.

I tried to set the following configuration but they are ignored: '--conf', "spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain", '--conf', "spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem", '--conf', "spark.hadoop.fs.s3a.bucket.<us-west-2-bucket -name>.endpoint=", '--conf', "spark.hadoop.fs.s3a.bucket.<us-west-2-bucket -name>.endpoint.region=us-west-2", '--conf', "spark.hadoop.fs.s3a.bucket.<us-east-1-bucket -name>.endpoint=", '--conf', "spark.hadoop.fs.s3a.bucket.<us-east-1-bucket -name>.endpoint.region=us-east-1", '--conf', "spark.hadoop.fs.s3a.path.style.access=false", '--conf', "spark.executor.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true", '--conf', "spark.driver.extraJavaOptions=-Dcom.amazonaws.services.s3.enableV4=true", '--conf', "Dfs.s3a.bucket.<us-east-1-bucket -name>.endpoint=", '--conf', "Dfs.s3a.bucket.<us-west-2-bucket -name>.endpoint=", '--conf', "spark.eventLog.enabled=false",

einavh avatar Sep 06 '22 13:09 einavh