iceberg
iceberg copied to clipboard
Respect spark.catalog.currentDatabase instead of hardcoded "default"
In the event that a database isn't defined the behavior falls back to using a hardcoded "default" database. Instead we should respect spark.catalog.currentDatabase.
I have a simple pull request prepared if folks are interested.
I'm open to changing this, but I don't think that spark.catalog.currentDatabase is necessarily the right choice. That is the current database for the global catalog, but there's no guarantee that Iceberg tables use that catalog. Should Iceberg have a way to set the default database instead? Or maybe this should be handled at a higher level and database should always be required.
I don't have strong opinions here. To add a little more information here I ran into an issue as part of the Convert Table to Iceberg example. As part of SparkSchemaUtil and the Hive hack.
Both use "default" if no database is passed in. When passing in the database I was getting errors around unsupported characters in the database (even with back-ticks), likely because we're using AWSGlueDataCatalog as the store we're trying to convert. I'm not sure how EMR typically solves this, but by having those objects use spark.catalog.currentDatabase instead of default it "just worked."
The net net is that this may not be important generally as the problem is probably both related to Glue metastores and EMR.
So it sounds like the problem was that those classes parse the table name instead of passing db and table separately? We'd definitely accept a PR that separated those options out and moved the single string parsing to a separate method. Supporting backticks would be great, too, but we generally expect to rely on Spark or a higher layer to provide that.