aws-sdk-go-v2 icon indicating copy to clipboard operation
aws-sdk-go-v2 copied to clipboard

Add support for AWS Glue Data Catalog views

Open svdimchenko opened this issue 6 months ago • 3 comments

Describe the feature

It would be great to add support for CRUD AWS Glue Data Catalog Views.

Use Case

Right now the view can be created via running Athena query only, however adding support to create it via AWS Glue API will simplify the process a lot and make the protected views maintanence much more robust.

This is a blocker for the following issue in aws terraform provider https://github.com/hashicorp/terraform-provider-aws/issues/38593.

Proposed Solution

See the comments in related issue https://github.com/hashicorp/terraform-provider-aws/issues/38593#issuecomment-2904095597

Other Information

No response

Acknowledgements

  • [ ] I may be able to implement this feature request
  • [ ] This feature might incur a breaking change

AWS Go SDK V2 Module Versions Used

github.com/aws/aws-sdk-go v1.55.7 https://github.com/hashicorp/terraform-provider-aws/blob/main/go.mod

Go version used

go 1.23.9

svdimchenko avatar May 24 '25 07:05 svdimchenko

What exactly is required from the API here?

I'm not an expert on Glue, but I was able to translate the examples from the docs to something like the following

package main

import (
	"context"
	"fmt"
	"github.com/aws/aws-sdk-go-v2/aws"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/service/glue"
	"github.com/aws/aws-sdk-go-v2/service/glue/types"
)

func main() {
	ctx := context.Background()
	cfg, err := config.LoadDefaultConfig(ctx, config.WithRegion("us-east-1"))
	if err != nil {
		panic("unable to load SDK config, " + err.Error())
	}

	client := glue.NewFromConfig(cfg)
        // query from the docs to create a catalog view https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/SECTION-jobs-glue-data-catalog-views.html
	query := "CREATE PROTECTED MULTI DIALECT VIEW test_data_view SECURITY DEFINER AS SELECT id FROM test_table WHERE id = 'SEATTLE';"
	input := &glue.CreateTableInput{
		DatabaseName: aws.String("test-db"),
		TableInput: &types.TableInput{
			Name: aws.String("test_data_view"),
			StorageDescriptor: &types.StorageDescriptor{
				Columns: []types.Column{
					{
						Name: aws.String("column1"),
						Type: aws.String("string"),
					},
				},
			},
			ViewDefinition: &types.ViewDefinitionInput{
				SubObjects: []string{
					"arn:aws:glue:us-east-1:123456789012:table/test-db/test_table",
				},
				IsProtected: aws.Bool(true),
				Representations: []types.ViewRepresentationInput{
					{
						Dialect:          types.ViewDialectSpark,
						DialectVersion:   aws.String("1.0"),
						ViewOriginalText: aws.String(query),
						ViewExpandedText: aws.String(query),
					},
				},
			},
		},
	}

	// Create the table
	result, err := client.CreateTable(ctx, input)
	if err != nil {
		panic("failed to create table, " + err.Error())
	}
	fmt.Println(result)
}

And I can see a view on AWS Glue, which I believe should be what we need. There were a couple of caveats to get this to run, which may be already known/handled by Terraform

  1. Had to disable Use only IAM access control for new tables in this database setting for my DB on AWS Lake Formation, otherwise I was getting AccessDeniedException: Create Table Default Permissions should be empty, either in the database or settings. ref
  2. Had to add my S3 bucket into AWS Lake Formation as a data source, otherwise I was hitting Multi Dialect views may only reference Lake Formation managed tables ref

Madrigal avatar May 27 '25 23:05 Madrigal

@Madrigal

How can we add ViewDefinition into the struct of CreateTableInput, then therefore it can be used as part of the api op. We need SDK level support for the ViewDefinition.

type CreateTableInput struct {

	// The catalog database in which to create the new table. For Hive compatibility,
	// this name is entirely lowercase.
	//
	// This member is required.
	DatabaseName *string

	// The TableInput object that defines the metadata table to create in the catalog.
	//
	// This member is required.
	TableInput *types.TableInput

	// The ID of the Data Catalog in which to create the Table . If none is supplied,
	// the Amazon Web Services account ID is used by default.
	CatalogId *string

	// Specifies an OpenTableFormatInput structure when creating an open format table.
	OpenTableFormatInput *types.OpenTableFormatInput

	// A list of partition indexes, PartitionIndex structures, to create in the table.
	PartitionIndexes []types.PartitionIndex
        
        // A structure that contains all the information that defines the view, including the dialect or dialects for the view, and the query.
	ViewDefinition *types.ViewDefinition
      
	// The ID of the transaction.
	TransactionId *string

	noSmithyDocumentSerde
}

hzloc avatar Jun 01 '25 12:06 hzloc

isn't it part of TableInput? specifically this section

input := &glue.CreateTableInput{
		TableInput: &types.TableInput{
			ViewDefinition: &types.ViewDefinitionInput{
				/// ...
			},
		},
	}

The input has this definition

type TableInput struct {

	// The table name. For Hive compatibility, this is folded to lowercase when it is
	// stored.
	//
	// This member is required.
	Name *[string](https://pkg.go.dev/builtin#string)

	// A description of the table.
	Description *[string](https://pkg.go.dev/builtin#string)

	// The last time that the table was accessed.
	LastAccessTime *[time](https://pkg.go.dev/time).[Time](https://pkg.go.dev/time#Time)

	// The last time that column statistics were computed for this table.
	LastAnalyzedTime *[time](https://pkg.go.dev/time).[Time](https://pkg.go.dev/time#Time)

	// The table owner. Included for Apache Hive compatibility. Not used in the normal
	// course of Glue operations.
	Owner *[string](https://pkg.go.dev/builtin#string)

	// These key-value pairs define properties associated with the table.
	Parameters map[[string](https://pkg.go.dev/builtin#string)][string](https://pkg.go.dev/builtin#string)

	// A list of columns by which the table is partitioned. Only primitive types are
	// supported as partition keys.
	//
	// When you create a table used by Amazon Athena, and you do not specify any
	// partitionKeys , you must at least set the value of partitionKeys to an empty
	// list. For example:
	//
	//     "PartitionKeys": []
	PartitionKeys [][Column](https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/[email protected]/types#Column)

	// The retention time for this table.
	Retention [int32](https://pkg.go.dev/builtin#int32)

	// A storage descriptor containing information about the physical storage of this
	// table.
	StorageDescriptor *[StorageDescriptor](https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/[email protected]/types#StorageDescriptor)

	// The type of this table. Glue will create tables with the EXTERNAL_TABLE type.
	// Other services, such as Athena, may create tables with additional table types.
	//
	// Glue related table types:
	//
	// EXTERNAL_TABLE Hive compatible attribute - indicates a non-Hive managed table.
	//
	// GOVERNED Used by Lake Formation. The Glue Data Catalog understands GOVERNED .
	TableType *[string](https://pkg.go.dev/builtin#string)

	// A TableIdentifier structure that describes a target table for resource linking.
	TargetTable *[TableIdentifier](https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/[email protected]/types#TableIdentifier)

	// A structure that contains all the information that defines the view, including
	// the dialect or dialects for the view, and the query.
	ViewDefinition *[ViewDefinitionInput](https://pkg.go.dev/github.com/aws/aws-sdk-go-v2/service/[email protected]/types#ViewDefinitionInput)

	// Included for Apache Hive compatibility. Not used in the normal course of Glue
	// operations.
	ViewExpandedText *[string](https://pkg.go.dev/builtin#string)

	// Included for Apache Hive compatibility. Not used in the normal course of Glue
	// operations. If the table is a VIRTUAL_VIEW , certain Athena configuration
	// encoded in base64.
	ViewOriginalText *[string](https://pkg.go.dev/builtin#string)
	// contains filtered or unexported fields
}

ref

Again, I'm not an expert on Glue, just want to understand what's the gap so we can communicate with the service team on what's missing.

Madrigal avatar Jun 02 '25 15:06 Madrigal