gravitino [Bug report] In certain environment, failed to insert data to hive table in spark-sql

Version

main branch

Describe what's wrong

spark-sql (default)> use cc2;
use cc2
Time taken: 0.059 seconds
spark-sql ()> CREATE DATABASE db2;
CREATE DATABASE db2
Time taken: 0.135 seconds
spark-sql ()> use db2;
use db2
Time taken: 0.04 seconds
spark-sql (db2)> CREATE TABLE hive_students (id INT, name STRING);
CREATE TABLE hive_students (id INT, name STRING)
Time taken: 0.252 seconds

spark-sql (db2)> INSERT INTO hive_students VALUES (1, 'Alice'), (2, 'Bob');
INSERT INTO hive_students VALUES (1, 'Alice'), (2, 'Bob')
24/04/24 11:59:23 WARN ObjectStore: Failed to get database db2, returning NoSuchObjectException
[SCHEMA_NOT_FOUND] The schema `db2` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS.
spark-sql (db2)> show tables;
show tables
hive_students
Time taken: 0.22 seconds, Fetched 1 row(s)

Error message and/or stacktrace

spark-sql (default)> use spark_catalog;
use spark_catalog
Time taken: 0.054 seconds
spark-sql (default)> CREATE DATABASE db;
CREATE DATABASE db
24/04/24 11:56:14 WARN ObjectStore: Failed to get database db, returning NoSuchObjectException
24/04/24 11:56:14 WARN ObjectStore: Failed to get database db, returning NoSuchObjectException
24/04/24 11:56:15 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
24/04/24 11:56:15 WARN ObjectStore: Failed to get database db, returning NoSuchObjectException
24/04/24 11:56:15 ERROR log: Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=ubuntu, access=WRITE, inode="/user/hive/warehouse-hive/db.db":root:hdfs:drwxr-xr-x
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
        at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1712)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1695)
        at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3896)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:984)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

How to reproduce

Configure the Spark session to use the Gravitino spark connector.

ubuntu@ip-172-31-33-70:/opt/spark$ ./bin/spark-sql -v --conf spark.plugins="com.datastrato.gravitino.spark.connector.plugin.GravitinoSparkPlugin" --conf spark.sql.gravitino.uri=http://3.115.106.59:8090 --conf spark.sql.gravitino.metalake=test --conf spark.sql.warehouse.dir=hdfs://18.183.104.49:9000/user/hive/warehouse-hive

Execute the Spark SQL query.

// use hive catalog
USE hive;
CREATE DATABASE db;
USE db;
CREATE TABLE hive_students (id INT, name STRING);
INSERT INTO hive_students VALUES (1, 'Alice'), (2, 'Bob');

Additional context

No response

Apr 24 '24 12:04 danhuawang

@danhuawang , it's mainly caused by using spark-sql and not setting spark.sql.hive.metastore.jars explicitly. I create a issue in https://github.com/apache/kyuubi/issues/6362, and will create a new PR to allow setting spark.sql.hive.metastore.jars to different values in #3270 , you could continue test spark connector with spark-shell not spark-sql

May 06 '24 04:05 FANNG1

@danhuawang , you could download hive jars to your machine and set corresponding catalog properties like below to use spark-sql

{
    "name": "hive",
    "type": "RELATIONAL",
    "comment": "comment",
    "provider": "hive",
    "properties": {
        "metastore.uris": "thrift://localhost:9083",
        "spark.sql.hive.metastore.jars":"path",
        "spark.sql.hive.metastore.jars.path":"file:///Users/fanng/deploy/hive/lib/*"
    }
}

Jun 01 '24 02:06 FANNG1

seems we could close it, @danhuawang WDYT?

Aug 01 '24 08:08 FANNG1

seems we could close it, @danhuawang WDYT?

@FANNG1 Sure. We can closed it.

Aug 01 '24 12:08 danhuawang