dev: add `make notebook`
Rationale for this change
Add make notebook to spin up a jupyter notebook
With spark connect (#2491) and our testing setup, we can quickly spin up a local env with
- spark
- iceberg rest catalog
- hive metastore
- minio
make test-integration-exec
make notebook
in the jupyter notebook, connect to spark easily
from pyspark.sql import SparkSession
# Create SparkSession against the remote Spark Connect server
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
spark.sql("SHOW CATALOGS").show()
Are these changes tested?
Are there any user-facing changes?
With spark connect (https://github.com/apache/iceberg-python/pull/2491) and our testing setup, we can quickly spin up a local env with
I agree, and that's great, but should we also spin up the resources as part of this effort? We could even inject a notebook that imports Spark-connect, etc (which won't be installed from a fresh install? I think this is a dev dependency, we probably want to double check there to avoid scaring newcomers to the project).
Bonus idea: what if make notebook or some other CLI entry point spun up pyspark + catalog configured via pyiceberg.yaml so users could immediately start querying their data?
We could even inject a notebook that imports Spark-connect
We could do getting started as a notebook! https://py.iceberg.apache.org/#getting-started-with-pyiceberg
Bonus idea: what if make notebook or some other CLI entry point spun up pyspark + catalog configured via pyiceberg.yaml so users could immediately start querying their data?
yea we could do that. the integration test setup gives us 2 different catalogs (rest and hms)
@kevinjqliu I would keep it simple, and go with the preferred catalog; REST :)