numaflow-python icon indicating copy to clipboard operation
numaflow-python copied to clipboard

Optimize example Dockerfiles

Open ayildirim21 opened this issue 1 year ago • 5 comments

Summary

The example images are quite large in size, and take a while to build. We should optimize this to reduce both size and build time. Considered solution: Build dependencies in a virtualenv in the builder layer, and copy over to the runner.

Use Cases

When building example images


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

ayildirim21 avatar Jul 11 '24 22:07 ayildirim21

@ab93 any other suggestions ?

vigith avatar Jul 11 '24 22:07 vigith

@ab93 any other suggestions ?

Yes we should do something along the lines of: https://github.com/ab93/dockerdemo/blob/main/Dockerfile

ab93 avatar Jul 11 '24 22:07 ab93

Yes we should do something along the lines of: https://github.com/ab93/dockerdemo/blob/main/Dockerfile

lol, that is too much for examples :)

vigith avatar Jul 12 '24 02:07 vigith

I think you got overwhelmed there 🙂 It's not hard, and the dockerfile will be very short for our use case. Numalogic dockerfile is also a good example.

ab93 avatar Jul 12 '24 03:07 ab93

Did optimize docker file a bit for the recent intuit builds! Will do some further research and go through the examples provided by @ab93 too.

@ayildirim21 if you're interested in working on this as well let me know!

kohlisid avatar Jul 12 '24 04:07 kohlisid

Hi @vigith @kohlisid, I came across this enhancement and would love to contribute to optimizing the example Dockerfiles.

If it's not actively being worked on, would it be okay for me to take this on?

sapkota-aayush avatar Jun 28 '25 16:06 sapkota-aayush

Absolutely, let me assign it to you.

vigith avatar Jun 28 '25 16:06 vigith

Thanks @sapkota-aayush for taking this up :D I have few ideas on this one as well that I can discuss with you and you can take it forward

kohlisid avatar Jun 28 '25 20:06 kohlisid

@kohlisid @vigith Thanks for assigning this to me! I'm excited to work on optimizing the Docker images for the examples. I can see from the test files (like test/map-e2e/testdata/flatmap.yaml) that there are examples from numaflow-python, numaflow-go, numaflow-java, and numaflow-rs being used in the e2e tests. Which repository should I start with? I'm thinking numaflow-python might be a good starting point since Python images tend to be larger due to dependencies, but wanted to check if there's a specific one that's causing more issues or if you have a preference for the order. I'm planning to examine the current Dockerfiles in the chosen repo to understand what needs optimization, then implement multi-stage builds to reduce image size and build time.

sapkota-aayush avatar Jun 29 '25 00:06 sapkota-aayush

start with numaflow-Python, that is the one with slow image build times

vigith avatar Jun 29 '25 00:06 vigith