metorikku
metorikku copied to clipboard
Unable to load S3
I have a very simple configuration file and Job file where in I select 20 rows from HADOOP System using HIVE Catalog and push it to S3 bucket. Job is populating the data frame and does not create file in S3. Could you please verify the following and provide me insight on what I am doing wrong? Thanks in advance for the help.
Command spark-submit --conf spark.sql.catalogImplementation=hive --conf spark.hadoop.dfs.nameservices=mycluster --conf spark.hadoop.fs.s3a.fast.upload=True --conf spark.hadoop.fs.s3a.path.style.access=True --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.hadoop.fs.s3a.access.key=<DEV-ACCESS-KEY> --conf spark.hadoop.fs.s3a.secret.key=<DEV-SECRET_KEY> --class com.yotpo.metorikku.Metorikku /home/myEdgenodePath/metorikku_2.11.jar -c /myHadoopFS/job-StraightLoad.yml
Job
metrics:
- /myHadoopFA/metric-StraightLoad.yml
variables: StartDate: 2021-09-01 EndDate: 2021-09-07 TrimmedDateFormat: yyyy-mm-dd
output: file: dir: s3a://dev-files-exchange/output
Metric
steps:
- dataFrameName: MYMonthly sql: select * from mySchema.my_aggregate where exp_dt = ${EndDate} LIMIT 20 ignoreOnFailures: false
output:
- dataFrameName: MYMonthly outputType: Parquet outputOptions: saveMode: Overwrite path: MYMonthly.parquet
Hi @AllamSudhakara:
I am currently using Metorikku and I am able to write to S3 parquet files. I am using this output's configuration:
- dataFrameName: df_name
outputType: File
format: parquet
outputOptions:
saveMode: Overwrite
path: s3a://<s3_bucket_name>/path/to/file
Looks like you are note building path correctly
Hi Luis,
Thanks for the reply. Would you know if there is any pipeline builder GUI developed based on Metorikku that generates .YML file and run the pipeline and visualize the progress and emit any errors? Please provide some details on if YotpoLtd has it and can supply under some license fee. It would be great If this GUI has the ability to read from Enterprise metadata for data scientists/analysts to build pipelines and progressively consolidate the Enterprise data assets.
Regards, Sudhakar
On Wed, Aug 10, 2022 at 3:20 PM Luis Cabezon Manchado < @.***> wrote:
Hi @AllamSudhakara https://github.com/AllamSudhakara:
I am currently using Metorikku and I am able to write to S3 parquet files. I am using this output's configuration:
- dataFrameName: df_name outputType: File format: parquet outputOptions: saveMode: Overwrite path: s3a://<s3_bucket_name>/path/to/file
Looks like you are note building path correctly
— Reply to this email directly, view it on GitHub https://github.com/YotpoLtd/metorikku/issues/453#issuecomment-1211156356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEBOADRN6NZF75WQ7QPGV6DVYP6G7ANCNFSM5E3KTAYA . You are receiving this because you were mentioned.Message ID: @.***>