aws-sdk-go-v2
aws-sdk-go-v2 copied to clipboard
Add support for AWS Glue Data Catalog views
Describe the feature
It would be great to add support for CRUD AWS Glue Data Catalog Views.
Use Case
Right now the view can be created via running Athena query only, however adding support to create it via AWS Glue API will simplify the process a lot and make the protected views maintanence much more robust.
This is a blocker for the following issue in aws terraform provider https://github.com/hashicorp/terraform-provider-aws/issues/38593.
Proposed Solution
See the comments in related issue https://github.com/hashicorp/terraform-provider-aws/issues/38593#issuecomment-2904095597
Other Information
No response
Acknowledgements
- [ ] I may be able to implement this feature request
- [ ] This feature might incur a breaking change
AWS Go SDK V2 Module Versions Used
github.com/aws/aws-sdk-go v1.55.7 https://github.com/hashicorp/terraform-provider-aws/blob/main/go.mod
Go version used
go 1.23.9
What exactly is required from the API here?
I'm not an expert on Glue, but I was able to translate the examples from the docs to something like the following
package main
import (
"context"
"fmt"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/service/glue"
"github.com/aws/aws-sdk-go-v2/service/glue/types"
)
func main() {
ctx := context.Background()
cfg, err := config.LoadDefaultConfig(ctx, config.WithRegion("us-east-1"))
if err != nil {
panic("unable to load SDK config, " + err.Error())
}
client := glue.NewFromConfig(cfg)
// query from the docs to create a catalog view https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/SECTION-jobs-glue-data-catalog-views.html
query := "CREATE PROTECTED MULTI DIALECT VIEW test_data_view SECURITY DEFINER AS SELECT id FROM test_table WHERE id = 'SEATTLE';"
input := &glue.CreateTableInput{
DatabaseName: aws.String("test-db"),
TableInput: &types.TableInput{
Name: aws.String("test_data_view"),
StorageDescriptor: &types.StorageDescriptor{
Columns: []types.Column{
{
Name: aws.String("column1"),
Type: aws.String("string"),
},
},
},
ViewDefinition: &types.ViewDefinitionInput{
SubObjects: []string{
"arn:aws:glue:us-east-1:123456789012:table/test-db/test_table",
},
IsProtected: aws.Bool(true),
Representations: []types.ViewRepresentationInput{
{
Dialect: types.ViewDialectSpark,
DialectVersion: aws.String("1.0"),
ViewOriginalText: aws.String(query),
ViewExpandedText: aws.String(query),
},
},
},
},
}
// Create the table
result, err := client.CreateTable(ctx, input)
if err != nil {
panic("failed to create table, " + err.Error())
}
fmt.Println(result)
}
And I can see a view on AWS Glue, which I believe should be what we need. There were a couple of caveats to get this to run, which may be already known/handled by Terraform
- Had to disable
Use only IAM access control for new tables in this databasesetting for my DB on AWS Lake Formation, otherwise I was gettingAccessDeniedException: Create Table Default Permissions should be empty, either in the database or settings.ref - Had to add my S3 bucket into AWS Lake Formation as a data source, otherwise I was hitting
Multi Dialect views may only reference Lake Formation managed tablesref
@Madrigal
How can we add ViewDefinition into the struct of CreateTableInput, then therefore it can be used as part of the api op. We need SDK level support for the ViewDefinition.
type CreateTableInput struct {
// The catalog database in which to create the new table. For Hive compatibility,
// this name is entirely lowercase.
//
// This member is required.
DatabaseName *string
// The TableInput object that defines the metadata table to create in the catalog.
//
// This member is required.
TableInput *types.TableInput
// The ID of the Data Catalog in which to create the Table . If none is supplied,
// the Amazon Web Services account ID is used by default.
CatalogId *string
// Specifies an OpenTableFormatInput structure when creating an open format table.
OpenTableFormatInput *types.OpenTableFormatInput
// A list of partition indexes, PartitionIndex structures, to create in the table.
PartitionIndexes []types.PartitionIndex
// A structure that contains all the information that defines the view, including the dialect or dialects for the view, and the query.
ViewDefinition *types.ViewDefinition
// The ID of the transaction.
TransactionId *string
noSmithyDocumentSerde
}
isn't it part of TableInput? specifically this section
input := &glue.CreateTableInput{
TableInput: &types.TableInput{
ViewDefinition: &types.ViewDefinitionInput{
/// ...
},
},
}
The input has this definition
type TableInput struct {
// The table name. For Hive compatibility, this is folded to lowercase when it is
// stored.
//
// This member is required.
Name *[string](https://pkg.go.dev/builtin#string)
// A description of the table.
Description *[string](https://pkg.go.dev/builtin#string)
// The last time that the table was accessed.
LastAccessTime *[time](https://pkg.go.dev/time).[Time](https://pkg.go.dev/time#Time)
// The last time that column statistics were computed for this table.
LastAnalyzedTime *[time](https://pkg.go.dev/time).[Time](https://pkg.go.dev/time#Time)
// The table owner. Included for Apache Hive compatibility. Not used in the normal
// course of Glue operations.
Owner *[string](https://pkg.go.dev/builtin#string)
// These key-value pairs define properties associated with the table.
Parameters map[[string](https://pkg.go.dev/builtin#string)][string](https://pkg.go.dev/builtin#string)
// A list of columns by which the table is partitioned. Only primitive types are
// supported as partition keys.
//
// When you create a table used by Amazon Athena, and you do not specify any
// partitionKeys , you must at least set the value of partitionKeys to an empty
// list. For example:
//
// "PartitionKeys": []
PartitionKeys [][Column](https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/[email protected]/types#Column)
// The retention time for this table.
Retention [int32](https://pkg.go.dev/builtin#int32)
// A storage descriptor containing information about the physical storage of this
// table.
StorageDescriptor *[StorageDescriptor](https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/[email protected]/types#StorageDescriptor)
// The type of this table. Glue will create tables with the EXTERNAL_TABLE type.
// Other services, such as Athena, may create tables with additional table types.
//
// Glue related table types:
//
// EXTERNAL_TABLE Hive compatible attribute - indicates a non-Hive managed table.
//
// GOVERNED Used by Lake Formation. The Glue Data Catalog understands GOVERNED .
TableType *[string](https://pkg.go.dev/builtin#string)
// A TableIdentifier structure that describes a target table for resource linking.
TargetTable *[TableIdentifier](https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/[email protected]/types#TableIdentifier)
// A structure that contains all the information that defines the view, including
// the dialect or dialects for the view, and the query.
ViewDefinition *[ViewDefinitionInput](https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/[email protected]/types#ViewDefinitionInput)
// Included for Apache Hive compatibility. Not used in the normal course of Glue
// operations.
ViewExpandedText *[string](https://pkg.go.dev/builtin#string)
// Included for Apache Hive compatibility. Not used in the normal course of Glue
// operations. If the table is a VIRTUAL_VIEW , certain Athena configuration
// encoded in base64.
ViewOriginalText *[string](https://pkg.go.dev/builtin#string)
// contains filtered or unexported fields
}
Again, I'm not an expert on Glue, just want to understand what's the gap so we can communicate with the service team on what's missing.