apiary-data-lake icon indicating copy to clipboard operation
apiary-data-lake copied to clipboard

Turn on performance_schema for apiary meta store

Open RongQiao opened this issue 6 years ago • 4 comments

I am from Expedia Data Engineering team. We have some AWS mysql RDS served as Hive Meta store, and we suffered some metastore performance issues in the past when some users run 'alter table recover partitions' etc on big dataset. Without the performance_schema, we don't have much insights about what's going on. The apiary metastore has more complicated use cases, so I would suggest that the performance_schema is available for apiary meta store.

RongQiao avatar Nov 07 '18 21:11 RongQiao

@rpoluri do you understand the request here? Is this some part of the underlying Hive metastore DB schema that for some reason we haven't activated?

massdosage avatar Feb 13 '20 12:02 massdosage

It's an extra component of mysql (seems like also in aurora https://aws.amazon.com/blogs/database/analyze-amazon-aurora-mysql-workloads-with-performance-insights/) that gives you more stats on who and what is abusing the database when you have performance problems. https://dev.mysql.com/doc/refman/8.0/en/performance-schema.html It's one of those things you don't need until the database is breaking, but I think it also has some nontrivial performance impact itself too.

mroark1m avatar Feb 14 '20 00:02 mroark1m

Wouldn't the owners of the RDS be able to set that up themselves then? i.e. it doesn't need to be part of Apiary? Feels more like a MySQL/Aurora DB admin task?

massdosage avatar Feb 14 '20 11:02 massdosage

We manage RDS part of Apiary Data Lake. May be this is corresponding option in RDS, https://www.terraform.io/docs/providers/aws/r/rds_cluster_instance.html#performance_insights_enabled will check and close issue accordingly.

rpoluri avatar Feb 14 '20 13:02 rpoluri