wenningd

Results 11 comments of wenningd

Sorry I don't fully understand your question. This should work: `os.environ['HUDI_CONF_DIR'] = args['HUDI_CONF_DIR']`. But what `args['HUDI_CONF_DIR']` stands for?

First thing is it should be something like `HUDI_CONF_DIR='s3://glue-development-bucket/scripts/hudi-conf/`. This is just the directory path. And the config file name should be `hudi-defaults.conf` rather than `hudi-default.conf` since we hard code...

@moustafaalaa I cannot reproduce this issue. I just directly export the external config through the shell something like: `export HUDI_CONF_DIR=s3://wenningd-xxx/hudi/config` and in the following Hudi code I can see it...

@nsivabalan Yes for the cluster mode, I would send out a PR to fix it. FYI https://github.com/apache/hudi/pull/5987

@moustafaalaa With https://github.com/apache/hudi/pull/5987, similar to EMR, you can add `hudi-defaults.conf` to `spark.yarn.dist.files` so that Hudi can load this file. This is a good suggestion: `hoodie.config.path => "s3:path"`. We would have...

No this is a new issue for us. As @fengjian428 mentioned, the locking happens during the commit stage, not in the data writing stage. Would be good if you can...

Got it. Can you provide more information on how to reproduce this issue? Like what is the size of Hudi table? Are you seeing this slow down happens even with...

This is a known issue. We are working on a fix and will push a PR once it's ready.

Hi @svaddoriya, are you using PySpark? If you are, there's a known bug. As a workaround, you can add aws-java-sdk-bundle jar in the PySpark classpath: ``` pyspark --jars /usr/share/aws/aws-java-sdk/aws-java-sdk-bundle*.jar,/usr/lib/hudi/hudi-spark-bundle.jar,/usr/lib/spark/jars/spark-avro.jar ```