[EPIC] Ballista 2025/H2 Roadmap Proposal
Following the completion of #1068, it's time to propose the next steps for Ballista.
In the short term, I would like to focus on the following areas:
- Improving test coverage: We continue to encounter bugs that could be prevented with more comprehensive unit and integration tests.
- Code cleanup and refactoring: There are several areas in the codebase that can be simplified or refactored, which can complement the testing improvements.
- Improving shuffle: There are numerous open issues related to shuffle files, and resolving them could yield significant benefits.
- Job-related enhancements: This includes improvements to job dependency graphs, adaptive query execution, and related functionality.
- Enhanced observability: Increasing the scope of scheduler-emitted events, including executor-related events, will help improve visibility and debugging.
- Simplifying and improving GitHub Actions: Streamlining our CI/CD processes to be more efficient and maintainable.
More details will follow after further discussion with the community.
Once again, thank you all for the incredible support on #1068!
There has been a lot of progress with shuffle performance in Comet that Ballista could benefit from.
I would take shuffle related task with highest priority @andygrove was thinking of #320 and few others related to compression, schema serialization and so on, but if there is easy picks in comet I'm more than happy to start from there. It would be great if you could provide few pointers for me to start
There is work in progress to add a datafusion-spark crate in the core DataFusion repo. See https://github.com/apache/datafusion/issues/5600 and https://github.com/apache/datafusion/pull/15168.
I would be happy to move some parts of Comet shuffle into this crate once it is available.
edit: using a Spark compatible shuffle file format may not necessarily be attractive for Ballista. We'll have to see if that makes sense or not.
would be happy to help. will have a look at comet
closing task as we're in the middle of H2/25