hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-26144: Add keys/indexes to support highly concurrent workload

Open kovjanos opened this issue 3 years ago • 4 comments

What changes were proposed in this pull request?

Missing keys/index is to be added to the HMS backend db schema

Why are the changes needed?

On a high-concurrency test we found that backend database is doing full table scans in some cases where the table has missing key/index.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Integration tests for all database types:

[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hive.metastore.dbinstall.ITestPostgres
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.949 s - in org.apache.hadoop.hive.metastore.dbinstall.ITestPostgres
[INFO] Running org.apache.hadoop.hive.metastore.dbinstall.ITestMssql
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.711 s - in org.apache.hadoop.hive.metastore.dbinstall.ITestMssql
[INFO] Running org.apache.hadoop.hive.metastore.dbinstall.ITestMysql
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.032 s - in org.apache.hadoop.hive.metastore.dbinstall.ITestMysql
[INFO] Running org.apache.hadoop.hive.metastore.dbinstall.ITestDerby
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.101 s - in org.apache.hadoop.hive.metastore.dbinstall.ITestDerby
[INFO] Running org.apache.hadoop.hive.metastore.dbinstall.ITestOracle
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 94.393 s - in org.apache.hadoop.hive.metastore.dbinstall.ITestOracle
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0

kovjanos avatar Apr 15 '22 19:04 kovjanos

Also, I am wondering if you tested the upgrade scripts with tables having data. What happens to existing rows when you alter the table to introduce the new PK column? Are they populated automatically?

mysql part (on MariaDB) was tested in production environment. Let me add the other tests and results here...

kovjanos avatar Apr 16 '22 17:04 kovjanos

@zabetak Tested with Derby, MySQL, PostgreSQL and MSSQL and worked - see below. Oracle doesn't work, I need to review again.

Versions tested:

az82/docker-derby
mysql:5.7
postgres:11.6
mcr.microsoft.com/mssql/server:2019-latest

Derby

CREATE TABLE TEST_TABLE (
  ID  bigint NOT NULL,
  DB  varchar(10) NOT NULL
);
CREATE INDEX TEST_TABLE_IDX ON TEST_TABLE (DB);
INSERT INTO TEST_TABLE VALUES (1,'1-db'),(2,'2-db'),(3,'3-db');
ALTER TABLE TEST_TABLE ADD COLUMN PKEY bigint PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY;
SELECT * FROM TEST_TABLE;
INSERT INTO TEST_TABLE (ID,DB) VALUES (4,'4-db'),(5,'5-db'),(6,'6-db');
SELECT * FROM TEST_TABLE;

PostgreSQL:

CREATE TABLE "TEST_TABLE" (
  "ID"  bigint NOT NULL,
  "DB"  varchar(10) NOT NULL);
CREATE INDEX TEST_TABLE_IDX ON "TEST_TABLE" USING btree ("DB");
INSERT INTO "TEST_TABLE" VALUES (1,'1-db'),(2,'2-db'),(3,'3-db');
ALTER TABLE "TEST_TABLE" ADD "PKEY" bigserial PRIMARY KEY;
SELECT * FROM "TEST_TABLE";
INSERT INTO "TEST_TABLE" VALUES (4,'4-db'),(5,'5-db'),(6,'6-db');
SELECT * FROM "TEST_TABLE";

MySQL:

CREATE TABLE TEST_TABLE (
  ID  bigint NOT NULL,
  DB  varchar(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE INDEX TEST_TABLE_IDX ON TEST_TABLE (DB) USING BTREE;
INSERT INTO TEST_TABLE VALUES (1,'1-db'),(2,'2-db'),(3,'3-db');
ALTER TABLE TEST_TABLE ADD COLUMN PKEY BIGINT NOT NULL PRIMARY KEY AUTO_INCREMENT FIRST;
SELECT * FROM TEST_TABLE;
INSERT INTO TEST_TABLE (ID,DB) VALUES (4,'4-db'),(5,'5-db'),(6,'6-db');
SELECT * FROM TEST_TABLE;

MSSQL:

CREATE TABLE TEST_TABLE (
  ID  bigint NOT NULL,
  DB  varchar(10) NOT NULL
);
CREATE INDEX TEST_TABLE_IDX ON TEST_TABLE (DB);
INSERT INTO TEST_TABLE VALUES (1,'1-db'),(2,'2-db'),(3,'3-db');
ALTER TABLE TEST_TABLE ADD PKEY bigint NOT NULL IDENTITY(1,1) PRIMARY KEY;
SELECT * FROM TEST_TABLE;
INSERT INTO TEST_TABLE VALUES (1,'1-db'),(2,'2-db'),(3,'3-db');
SELECT * FROM TEST_TABLE;

kovjanos avatar Apr 16 '22 21:04 kovjanos

That doesn't work with Oracle, even plain table has uniqueness issues - see below. Do we need to have this also on ORACLE? I can add the column to keep schema consistent, but it can't be a PRIMARY KEY (might not even be required for the DELETEs in Oracle if it works differently). One solution might be the old-school solution: a sequence and trigger behind the increment column. I'll give it a try..

CREATE TABLE TEST_TABLE (
  ID  number(19) NOT NULL,
  DB  varchar(10) NOT NULL,
  PKEY NUMBER(19) GENERATED ALWAYS AS IDENTITY 
) ROWDEPENDENCIES;
INSERT ALL
  INTO TEST_TABLE (ID, DB) VALUES (1,'1-db')
  INTO TEST_TABLE (ID, DB) VALUES (2,'2-db')
  INTO TEST_TABLE (ID, DB) VALUES (3,'3-db')
SELECT 1 FROM DUAL;
SELECT * FROM TEST_TABLE;

  ID DB         PKEY
---- ---------- ----
   1 1-db          1
   2 2-db          1
   3 3-db          1

kovjanos avatar Apr 16 '22 21:04 kovjanos

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Feel free to reach out on the [email protected] list if the patch is in need of reviews.

github-actions[bot] avatar Aug 07 '22 00:08 github-actions[bot]