hudi icon indicating copy to clipboard operation
hudi copied to clipboard

Hudi Multi Writer DynamoDBBasedLocking issue

Open koochiswathiTR opened this issue 1 year ago • 3 comments

Hi, This is the first time we are setting up hudi with Multi writer, Below are my hudi config properties, I have set up HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key()->"optimistic_concurrency_control", HoodieCompactionConfig.FAILED_WRITES_CLEANER_POLICY.key()->"LAZY", HoodieLockConfig.LOCK_ACQUIRE_NUM_RETRIES.key()->"3000", HoodieLockConfig.LOCK_ACQUIRE_CLIENT_NUM_RETRIES.key()->"1", HoodieLockConfig.LOCK_PROVIDER_CLASS_NAME.key()->"org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider", DynamoDbBasedLockConfig.DYNAMODB_LOCK_TABLE_NAME.key()->"hoodi_lock", DynamoDbBasedLockConfig.DYNAMODB_LOCK_PARTITION_KEY.key()->"lock", DynamoDbBasedLockConfig.DYNAMODB_LOCK_REGION.key()->"us-east-1", HoodieAWSConfig.AWS_ACCESS_KEY.key()->"XXX", HoodieAWSConfig.AWS_SECRET_KEY.key()->"XXX", HoodieAWSConfig.AWS_SESSION_TOKEN.key()->"XXXX", DynamoDbBasedLockConfig.DYNAMODB_ENDPOINT_URL.key()-> RegionUtils.getRegion("us-east-1").getServiceEndpoint(AmazonDynamoDB.ENDPOINT_PREFIX) //"dynamodb.us-east-1.amazonaws.com"

I have created dynamodb table which will be used for locking, and partition key as lock Below are my questions,

Is it mandatory to set AWS_ACCESS_KEY,AWS_SECRET_KEY ? - I dont want to set these keys Should we need to create Dynamodb table or Hudi will create it automatically? we create AWS resources with Cloudformation I am getting below exception while connecting to dynamodb table

com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The provided key element does not match the schema Dynamodb table is created with partition key lock(String)

koochiswathiTR avatar Sep 09 '22 11:09 koochiswathiTR

@zhedoubushishi @nsivabalan

Please help me here

koochiswathiTR avatar Sep 09 '22 11:09 koochiswathiTR

Is it mandatory to set AWS_ACCESS_KEY,AWS_SECRET_KEY ?

No you should not need to. in aws env you'll just rely on whatever roles for your service to access another service. Please raise support case with aws and get help to configure roles properly.

xushiyan avatar Sep 15 '22 00:09 xushiyan

@koochiswathiTR Thanks for raising this! The config naming of partition_key is confusing to new comers. Here's what you need to do: (1) As @xushiyan already mentioned, you don't need to set the credentials in env variables if the instance or service is already granted access with the proper roles; (2) By default, hoodie.write.lock.dynamodb.partition_key is set to the table name, so that multiple writers writing to the same table share the same lock. If you customize the name, make sure it's the same for multiple writers; (3) Note that, what hoodie.write.lock.dynamodb.partition_key specifies actually means the value to use for the column, and not the column name itself. The column name is fixed to be key in DynamoDB table; (4) The DynamoDB table for locking purposes is automatically created from the Hudi code, so you don't have to create the table yourself. If you do so, make sure that the key column is present in the table, not lock or the value specified by hoodie.write.lock.dynamodb.partition_key.

Let me know if this solves your problem. Feel free to close it once all good.

yihua avatar Sep 21 '22 22:09 yihua

we have improved our docs around this

When using the DynamoDB-based lock provider, the name of the DynamoDB table acting as the lock table for Hudi is specified by the config hoodie.write.lock.dynamodb.table. This DynamoDB table is automatically created by Hudi, so you don't have to create the table yourself. If you want to use an existing DynamoDB table, make sure that an attribute with the name key is present in the table. The key attribute should be the partition key of the DynamoDB table. The config hoodie.write.lock.dynamodb.partition_key specifies the value to put for the key attribute (not the attribute name), which is used for the lock on the same table. By default, hoodie.write.lock.dynamodb.partition_key is set to the table name, so that multiple writers writing to the same table share the same lock. If you customize the name, make sure it's the same across multiple writers.

https://hudi.apache.org/docs/concurrency_control

Hope this answers your question. Feel free to re-open or raise a new issue if you need more assistance.

nsivabalan avatar Oct 22 '22 23:10 nsivabalan