hudi
hudi copied to clipboard
Hudi Multi Writer DynamoDBBasedLocking issue
Hi, This is the first time we are setting up hudi with Multi writer, Below are my hudi config properties, I have set up HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key()->"optimistic_concurrency_control", HoodieCompactionConfig.FAILED_WRITES_CLEANER_POLICY.key()->"LAZY", HoodieLockConfig.LOCK_ACQUIRE_NUM_RETRIES.key()->"3000", HoodieLockConfig.LOCK_ACQUIRE_CLIENT_NUM_RETRIES.key()->"1", HoodieLockConfig.LOCK_PROVIDER_CLASS_NAME.key()->"org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider", DynamoDbBasedLockConfig.DYNAMODB_LOCK_TABLE_NAME.key()->"hoodi_lock", DynamoDbBasedLockConfig.DYNAMODB_LOCK_PARTITION_KEY.key()->"lock", DynamoDbBasedLockConfig.DYNAMODB_LOCK_REGION.key()->"us-east-1", HoodieAWSConfig.AWS_ACCESS_KEY.key()->"XXX", HoodieAWSConfig.AWS_SECRET_KEY.key()->"XXX", HoodieAWSConfig.AWS_SESSION_TOKEN.key()->"XXXX", DynamoDbBasedLockConfig.DYNAMODB_ENDPOINT_URL.key()-> RegionUtils.getRegion("us-east-1").getServiceEndpoint(AmazonDynamoDB.ENDPOINT_PREFIX) //"dynamodb.us-east-1.amazonaws.com"
I have created dynamodb table which will be used for locking, and partition key as lock Below are my questions,
Is it mandatory to set AWS_ACCESS_KEY,AWS_SECRET_KEY ? - I dont want to set these keys Should we need to create Dynamodb table or Hudi will create it automatically? we create AWS resources with Cloudformation I am getting below exception while connecting to dynamodb table
com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The provided key element does not match the schema Dynamodb table is created with partition key lock(String)
@zhedoubushishi @nsivabalan
Please help me here
Is it mandatory to set AWS_ACCESS_KEY,AWS_SECRET_KEY ?
No you should not need to. in aws env you'll just rely on whatever roles for your service to access another service. Please raise support case with aws and get help to configure roles properly.
@koochiswathiTR Thanks for raising this! The config naming of partition_key
is confusing to new comers. Here's what you need to do:
(1) As @xushiyan already mentioned, you don't need to set the credentials in env variables if the instance or service is already granted access with the proper roles;
(2) By default, hoodie.write.lock.dynamodb.partition_key
is set to the table name, so that multiple writers writing to the same table share the same lock. If you customize the name, make sure it's the same for multiple writers;
(3) Note that, what hoodie.write.lock.dynamodb.partition_key
specifies actually means the value to use for the column, and not the column name itself. The column name is fixed to be key
in DynamoDB table;
(4) The DynamoDB table for locking purposes is automatically created from the Hudi code, so you don't have to create the table yourself. If you do so, make sure that the key
column is present in the table, not lock
or the value specified by hoodie.write.lock.dynamodb.partition_key
.
Let me know if this solves your problem. Feel free to close it once all good.
we have improved our docs around this
When using the DynamoDB-based lock provider, the name of the DynamoDB table acting as the lock table for Hudi is specified by the config hoodie.write.lock.dynamodb.table. This DynamoDB table is automatically created by Hudi, so you don't have to create the table yourself. If you want to use an existing DynamoDB table, make sure that an attribute with the name key is present in the table. The key attribute should be the partition key of the DynamoDB table. The config hoodie.write.lock.dynamodb.partition_key specifies the value to put for the key attribute (not the attribute name), which is used for the lock on the same table. By default, hoodie.write.lock.dynamodb.partition_key is set to the table name, so that multiple writers writing to the same table share the same lock. If you customize the name, make sure it's the same across multiple writers.
https://hudi.apache.org/docs/concurrency_control
Hope this answers your question. Feel free to re-open or raise a new issue if you need more assistance.