metorikku icon indicating copy to clipboard operation
metorikku copied to clipboard

Unable to load S3

Open AllamSudhakara opened this issue 3 years ago • 2 comments

I have a very simple configuration file and Job file where in I select 20 rows from HADOOP System using HIVE Catalog and push it to S3 bucket. Job is populating the data frame and does not create file in S3. Could you please verify the following and provide me insight on what I am doing wrong? Thanks in advance for the help.

Command spark-submit --conf spark.sql.catalogImplementation=hive --conf spark.hadoop.dfs.nameservices=mycluster --conf spark.hadoop.fs.s3a.fast.upload=True --conf spark.hadoop.fs.s3a.path.style.access=True --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.hadoop.fs.s3a.access.key=<DEV-ACCESS-KEY> --conf spark.hadoop.fs.s3a.secret.key=<DEV-SECRET_KEY> --class com.yotpo.metorikku.Metorikku /home/myEdgenodePath/metorikku_2.11.jar -c /myHadoopFS/job-StraightLoad.yml

Job

metrics:

  • /myHadoopFA/metric-StraightLoad.yml

variables: StartDate: 2021-09-01 EndDate: 2021-09-07 TrimmedDateFormat: yyyy-mm-dd

output: file: dir: s3a://dev-files-exchange/output

Metric

steps:

  • dataFrameName: MYMonthly sql: select * from mySchema.my_aggregate where exp_dt = ${EndDate} LIMIT 20 ignoreOnFailures: false

output:

  • dataFrameName: MYMonthly outputType: Parquet outputOptions: saveMode: Overwrite path: MYMonthly.parquet

AllamSudhakara avatar Sep 27 '21 19:09 AllamSudhakara

Hi @AllamSudhakara:

I am currently using Metorikku and I am able to write to S3 parquet files. I am using this output's configuration:

  - dataFrameName: df_name
    outputType: File
    format: parquet
    outputOptions:
      saveMode: Overwrite
      path: s3a://<s3_bucket_name>/path/to/file

Looks like you are note building path correctly

lucabem avatar Aug 10 '22 19:08 lucabem

Hi Luis,

Thanks for the reply. Would you know if there is any pipeline builder GUI developed based on Metorikku that generates .YML file and run the pipeline and visualize the progress and emit any errors? Please provide some details on if YotpoLtd has it and can supply under some license fee. It would be great If this GUI has the ability to read from Enterprise metadata for data scientists/analysts to build pipelines and progressively consolidate the Enterprise data assets.

Regards, Sudhakar

On Wed, Aug 10, 2022 at 3:20 PM Luis Cabezon Manchado < @.***> wrote:

Hi @AllamSudhakara https://github.com/AllamSudhakara:

I am currently using Metorikku and I am able to write to S3 parquet files. I am using this output's configuration:

  • dataFrameName: df_name outputType: File format: parquet outputOptions: saveMode: Overwrite path: s3a://<s3_bucket_name>/path/to/file

Looks like you are note building path correctly

— Reply to this email directly, view it on GitHub https://github.com/YotpoLtd/metorikku/issues/453#issuecomment-1211156356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBOADRN6NZF75WQ7QPGV6DVYP6G7ANCNFSM5E3KTAYA . You are receiving this because you were mentioned.Message ID: @.***>

AllamSudhakara avatar Aug 10 '22 21:08 AllamSudhakara