mack icon indicating copy to clipboard operation
mack copied to clipboard

Docker image for contributors

Open Triamus opened this issue 2 years ago • 11 comments

A ready-to-use docker image for project contributors as alternative to (local) Python environment via Poetry. The image probably would need to include things like

  • base image (e.g. ubuntu)
  • (py)spark
  • delta
  • python libs
  • environment vars
  • etc.

I just tried to find examples for other oss repos but didn't find any in the short research time. So maybe this is not useful to most contributors or there are other reasons not to have it. Nothing would come to my mind atm.

Triamus avatar Jan 23 '23 14:01 Triamus

@Triamus - thanks for adding this.

Anyone in the community can feel free to grab this.

MrPowers avatar Jan 23 '23 15:01 MrPowers

@MrPowers I have built this kind of docker images in the past. Mind if I take a stab at this?

souvik-databricks avatar Jan 24 '23 18:01 souvik-databricks

@souvik-databricks you probably know this but a few things I already researched.

I would think that ideally any image is building on top of those efforts but I don't know the timeline. In Jira they speak of Spark 3.4.

Triamus avatar Jan 24 '23 19:01 Triamus

@souvik-databricks I'd be happy to test things out if needed.

Triamus avatar Jan 24 '23 20:01 Triamus

@souvik-databricks - yea, sure, go for it!

I think there are some Delta Lake docker images around. Let me take a look.

MrPowers avatar Jan 24 '23 20:01 MrPowers

Actually, looks like @Triamus has already provided the link, here it is: GitHub/delta-io/delta-docs: quickstart_docker

MrPowers avatar Jan 24 '23 20:01 MrPowers

@MrPowers I have a local branch ready to go for Docker and docker-compose support for mack if you want it. Runs the unit tests inside the container and also has instructions for dropping into the container for development as well. Container has Spark (spark-3.3.2), Delta (delta-core_2.12:2.2.0), etc, everything needed to develop and test.

danielbeach avatar Mar 02 '23 01:03 danielbeach

@danielbeach - yea, that sounds great. Any chance you could send a PR? I'll be happy to test, document in the README, and market. Thank you!

MrPowers avatar Mar 02 '23 08:03 MrPowers

@MrPowers I tried to push a PR, but need access.

danielbeach avatar Mar 02 '23 12:03 danielbeach

@danielbeach - sent you an invite to collab on the repo ;)

MrPowers avatar Mar 02 '23 14:03 MrPowers

In the opening of the issue, I mentioned that I didn't find nice OSS examples of creating a reproducible local dev setup for contributors of a project. By coincident, I saw a talk from PyData Global 2022 which was recently uploaded to youtube on exactly that topic from one of the core Airflow devs. And it turns out that Airflow has invested a lot in what they call a breeze environment to cover everything from local dev and test to deployment. It is certainly overngineering for mack at this point but it has some nice insights and potential ideas that one can draw inspiration from. I leave the talk and Airflow Breeze docs here for future reference.

From the docs:

Airflow Breeze is an easy-to-use development and test environment using Docker Compose. The environment is available for local use and is also used in Airflow's CI tests. We call it Airflow Breeze as It's a Breeze to contribute to Airflow. The advantages and disadvantages of using the Breeze environment vs. other ways of testing Airflow are described in CONTRIBUTING.rst.

Triamus avatar Mar 03 '23 12:03 Triamus