qbeast-spark
qbeast-spark copied to clipboard
Allow CREATE TABLE without schema
Right now, when trying to create a Qbeast Table without schema, the following exception is raised:
spark.sql("CREATE TABLE t USING qbeast LOCATION '/tmp/test'")
org.apache.spark.sql.AnalysisException: Trying to create an External Table without any schema. Please specify the schema in the command or use a path of a populated table.
When executing the same code with Delta, it works creating a Delta Table with the following information:
spark.sql("CREATE TABLE t USING delta LOCATION '/tmp/test'")
{
"commitInfo": {
"timestamp": 1709215123871,
"operation": "CREATE TABLE",
"operationParameters": {
"isManaged": "false",
"description": null,
"partitionBy": "[]",
"properties": "{}"
},
"isolationLevel": "Serializable",
"isBlindAppend": true,
"operationMetrics": {},
"engineInfo": "Apache-Spark/3.5.0 Delta-Lake/3.1.0",
"txnId": "6cf7a54c-cfc4-4d1a-831e-b61729e123ed"
}
}
{
"metaData": {
"id": "e9b1abc8-7ab7-4796-8761-7fb588e2eb6c",
"format": {
"provider": "parquet",
"options": {}
},
"schemaString": "{\"type\":\"struct\",\"fields\":[]}",
"partitionColumns": [],
"configuration": {},
"createdTime": 1709215123802
}
}
{
"protocol": {
"minReaderVersion": 1,
"minWriterVersion": 2
}
}
We should allow the Qbeast Datasource to create a new table for a location, even if the schema is not specified. Once the first write is made, we should enforce the schema relying on Delta properties.