aws-cdk icon indicating copy to clipboard operation
aws-cdk copied to clipboard

(glue-alpha): cannot create 2 partitionIndexes simultaneously

Open clueleaf opened this issue 2 years ago • 8 comments

Describe the bug

When passing 2 indexes to partitionIndexes of glue.Table, table creation fails.

Expected Behavior

Glue table and indexes are created.

Current Behavior

Table indexes creation fails.

Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table.

Reproduction Steps

Create a glue table with 2 indexes.

const bucket = new s3.Bucket(stack, 'DataBucket');
const database = new glue.Database(stack, 'MyDatabase', {
  databaseName: 'database',
});

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [
    { indexName: 'index1', keyNames: ['month'] },
    { indexName: 'index2', keyNames: ['month', 'year'] },
  ],
  dataFormat: glue.DataFormat.CSV,
});

It fails sometimes even if only one index is passed to partitionIndexes and the rest is added using table.addPartitionIndex.

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [{ indexName: 'index1', keyNames: ['month'] }],
  dataFormat: glue.DataFormat.CSV,
});

csvTable.addPartitionIndex({ indexName: 'index2', keyNames: ['month', 'year'] })

Possible Solution

I think this a restriction of Glue service.

Additional Information/Context

No response

CDK CLI Version

2.70.0

Framework Version

No response

Node.js Version

18

OS

macOS Ventura

Language

Typescript

Language Version

No response

Other information

No response

clueleaf avatar Mar 28 '23 02:03 clueleaf

Hi @clueleaf , thanks for reaching out.

Its stated in the available documentation that you can have a maximum of 3 partition indexes in the table. But its also stated here - `

  • Partition indexes must be created one at a time. To avoid
  • race conditions, we store the resource and add dependencies
  • each time a new partition index is created. ` I am also getting the error while creating 2 indexes at the same time but it succeeds when I am adding Partition Index later on. Since workaround is there, currently I am marking this as P2 which means our team won't be able to work on it immediately. However if you would like to contribute to resolving this bug, that would be great. Here is a contributing guide to get started.

We also use +1s to help prioritize our work, and are happy to re-evaluate this issue based on community feedback. You can reach out to the cdk.dev community on Slack to solicit support for re-prioritization. (edited)

khushail avatar Mar 28 '23 21:03 khushail

@khushail Thank you for your investigation. One wired thing is that even if I use addPartitionIndex to add index later on, it fails just as the same. It's hard to tell why it succeeds sometimes but not always.

const bucket = new s3.Bucket(stack, 'DataBucket');
const database = new glue.Database(stack, 'MyDatabase', {
  databaseName: 'database',
});

const csvTable = new glue.Table(stack, 'CSVTable', {
  database,
  bucket,
  tableName: 'csv_table',
  columns: [
    { name: 'col1', type: glue.Schema.STRING },
    { name: 'col2', type: glue.Schema.STRING },
    { name: 'col3', type: glue.Schema.STRING },
  ],
  partitionKeys: [
    { name: 'year', type: glue.Schema.SMALL_INT },
    { name: 'month', type: glue.Schema.BIG_INT },
  ],
  partitionIndexes: [{ indexName: 'index1', keyNames: ['month'] }],
  dataFormat: glue.DataFormat.CSV,
});
csvTable.addPartitionIndex({ indexName: 'index2', keyNames: ['month', 'year'] })

clueleaf avatar Mar 29 '23 04:03 clueleaf

@clueleaf , could you please share the error that you see when it fails. As I am not able to repro this error, it might be helpful for reference while creating a PR.

khushail avatar Mar 29 '23 18:03 khushail

Sure.

**:**:** ** | CREATE_FAILED        | Custom::AWS           | CSVTablepartitionindexindex16247ABF6
Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)

 ❌  MyStack (MyStack) failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)
    at FullCloudFormationDeployment.monitorDeployment (/Users/***/node_modules/aws-cdk/lib/index.js:380:10236)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async deployStack2 (/Users/***/node_modules/aws-cdk/lib/index.js:383:145458)
    at async /Users/***/node_modules/aws-cdk/lib/index.js:383:128776
    at async run (/Users/***/node_modules/aws-cdk/lib/index.js:383:126782)

 ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)
    at deployStacks (/Users/***/node_modules/aws-cdk/lib/index.js:383:129083)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async CdkToolkit.deploy (/Users/***/node_modules/aws-cdk/lib/index.js:383:147507)
    at async exec4 (/Users/***/node_modules/aws-cdk/lib/index.js:438:51799)

Stack Deployments Failed: Error: The stack named MyStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Received response status [FAILED] from custom resource. Message returned: Index index2 is in CREATING state. Only 1 index can be created or deleted simultaneously per table. (RequestId: 9a709d0e-4e9d-49e3-8202-fd781b73266b)

clueleaf avatar Mar 30 '23 00:03 clueleaf

thanks @clueleaf .

khushail avatar Mar 30 '23 17:03 khushail

I have same issue, it worked previously.

yuntaoL avatar May 03 '23 18:05 yuntaoL

IMO, the best thing is to avoid returning nothing in the addPartitionIndex function and instead return the object, so then we could chain dependencies between the two indexes.

Something like this (currently doesn't work because it returns void):

        const table = new S3Table(this, 'Something', {
              .
              .
              .
             });


        const pI1 = table.addPartitionIndex({
                    indexName: 'year_month_day',
                    keyNames: ['year', 'month', 'day']
                });
        const pI2 = table.addPartitionIndex({
                    indexName: 'country_site',
                    keyNames: ['country', 'site']
                });
        pI1.addDependency(pI2); # Does't work because pI1 and pI2 are void

prazian avatar Jun 28 '24 07:06 prazian