iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

dev: add `make notebook`

Open kevinjqliu opened this issue 3 months ago • 5 comments

Rationale for this change

Add make notebook to spin up a jupyter notebook

With spark connect (#2491) and our testing setup, we can quickly spin up a local env with

  • spark
  • iceberg rest catalog
  • hive metastore
  • minio
make test-integration-exec
make notebook

in the jupyter notebook, connect to spark easily

from pyspark.sql import SparkSession

# Create SparkSession against the remote Spark Connect server
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
spark.sql("SHOW CATALOGS").show()

Are these changes tested?

Are there any user-facing changes?

kevinjqliu avatar Sep 26 '25 02:09 kevinjqliu

With spark connect (https://github.com/apache/iceberg-python/pull/2491) and our testing setup, we can quickly spin up a local env with

I agree, and that's great, but should we also spin up the resources as part of this effort? We could even inject a notebook that imports Spark-connect, etc (which won't be installed from a fresh install? I think this is a dev dependency, we probably want to double check there to avoid scaring newcomers to the project).

Fokko avatar Sep 26 '25 09:09 Fokko

Bonus idea: what if make notebook or some other CLI entry point spun up pyspark + catalog configured via pyiceberg.yaml so users could immediately start querying their data?

jayceslesar avatar Sep 26 '25 15:09 jayceslesar

We could even inject a notebook that imports Spark-connect

We could do getting started as a notebook! https://py.iceberg.apache.org/#getting-started-with-pyiceberg

kevinjqliu avatar Sep 26 '25 15:09 kevinjqliu

Bonus idea: what if make notebook or some other CLI entry point spun up pyspark + catalog configured via pyiceberg.yaml so users could immediately start querying their data?

yea we could do that. the integration test setup gives us 2 different catalogs (rest and hms)

kevinjqliu avatar Sep 26 '25 15:09 kevinjqliu

@kevinjqliu I would keep it simple, and go with the preferred catalog; REST :)

Fokko avatar Sep 30 '25 19:09 Fokko