hamilton icon indicating copy to clipboard operation
hamilton copied to clipboard

Project roadmap post-acquisition

Open eric-czech opened this issue 5 months ago • 7 comments

I love what you all set out to do with this library -- thank you for sharing it! I'm curious what's on the roadmap for it following the Salesforce acquisition. Is it likely to remain an important dependency for Burr?

I see development for both libraries has fallen off recently, and I was hoping to get some insight on what the future looks like for them.

eric-czech avatar Jul 24 '25 13:07 eric-czech

@eric-czech thanks for the issue.

Development has fallen off because we need to complete the apache process for releasing -- which has taken longer than intended. We're actively doing things where we can, but would love more contributors to help out here.

We see both of them being run independently and living a good life. I need to bring up more of the PMC team to have the confidence to merge things without @elijahbenizzy and myself being blockers for reviewing and merging things, and then expand to having more contributors, etc.

skrawcz avatar Jul 24 '25 16:07 skrawcz

Thanks @skrawcz.

Do you know of any other users that are focused on AI training and data pre/post processing use cases for Hamilton? We might be useful contributors if we can make a case that it is a good choice for those kinds of applications. If you know of other users like that, it would be helpful to speak with them or at least follow along on what they've done (if any of it is public).

I appreciate any details, and congratulations on the acquisition!

eric-czech avatar Jul 29 '25 17:07 eric-czech

@eric-czech will reach out and ask around. Quite a few people are using it for this purpose -- can you specify a little more on what the use-case is?

elijahbenizzy avatar Aug 07 '25 04:08 elijahbenizzy

can you specify a little more on what the use-case is?

Certainly. It would involve small to medium scale workflows on large datasets for:

  • Running data integration and processing workflows via Pandas, Spark, Dask, Xarray, Ray, etc.
  • Running training jobs on SLURM clusters and/or Neocloud providers
  • Running inference and post-processing pipelines (with or without accelerators)

Hamilton seems like an obvious fit for handling the large number of small steps related to pre-processing, e.g. Longer term, I'm also interested in seeing to what extent it may be helpful for orchestrating work across mixed hardware and multiple cloud providers. To be clear, I don't expect it to do much other than solve for building DAGs and offering some reasonable semantics over retries and caching of expensive results. Provisioning, configuration, pickling functions, validating schemas, etc. are all things I would expect other tools to do -- I'm only really looking to Hamilton to define workflows via Python rather than a DSL or API complicated enough to be called a DSL.

eric-czech avatar Aug 11 '25 16:08 eric-czech

Nice. Yeah start simple. Make sense.

Note: we have had users look at Hamilton to orchestrate python code + slurm jobs see - https://github.com/apache/hamilton/discussions/586 - I think there might be some code floating about if that's useful to look at.

skrawcz avatar Aug 12 '25 04:08 skrawcz

@eric-czech -- checking in -- how's it been so far? Curious if you've been able to start small...

elijahbenizzy avatar Aug 25 '25 03:08 elijahbenizzy

Hi @elijahbenizzy, I haven't done much yet. I had two use cases in mind: 1) managing large numbers of small steps (e.g. data prep), 2) orchestrating smaller numbers of long-running steps across mixed hardware. We've found a different solution for use case 2, but I still think Hamilton could be a good fit for use case 1. I haven't gotten to trying it in earnest yet though.

eric-czech avatar Oct 02 '25 20:10 eric-czech