numaflow-python
numaflow-python copied to clipboard
Optimize example Dockerfiles
Summary
The example images are quite large in size, and take a while to build. We should optimize this to reduce both size and build time. Considered solution: Build dependencies in a virtualenv in the builder layer, and copy over to the runner.
Use Cases
When building example images
Message from the maintainers:
If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
@ab93 any other suggestions ?
@ab93 any other suggestions ?
Yes we should do something along the lines of: https://github.com/ab93/dockerdemo/blob/main/Dockerfile
Yes we should do something along the lines of: https://github.com/ab93/dockerdemo/blob/main/Dockerfile
lol, that is too much for examples :)
I think you got overwhelmed there 🙂 It's not hard, and the dockerfile will be very short for our use case. Numalogic dockerfile is also a good example.
Did optimize docker file a bit for the recent intuit builds! Will do some further research and go through the examples provided by @ab93 too.
@ayildirim21 if you're interested in working on this as well let me know!
Hi @vigith @kohlisid, I came across this enhancement and would love to contribute to optimizing the example Dockerfiles.
If it's not actively being worked on, would it be okay for me to take this on?
Absolutely, let me assign it to you.
Thanks @sapkota-aayush for taking this up :D I have few ideas on this one as well that I can discuss with you and you can take it forward
@kohlisid @vigith Thanks for assigning this to me! I'm excited to work on optimizing the Docker images for the examples. I can see from the test files (like test/map-e2e/testdata/flatmap.yaml) that there are examples from numaflow-python, numaflow-go, numaflow-java, and numaflow-rs being used in the e2e tests. Which repository should I start with? I'm thinking numaflow-python might be a good starting point since Python images tend to be larger due to dependencies, but wanted to check if there's a specific one that's causing more issues or if you have a preference for the order. I'm planning to examine the current Dockerfiles in the chosen repo to understand what needs optimization, then implement multi-stage builds to reduce image size and build time.
start with numaflow-Python, that is the one with slow image build times