cubed
cubed copied to clipboard
Optimization tracking issue
This is an umbrella issue for tracking the work on optimizations in Cubed.
Creation optimizations
Making creation operations more efficient, typically by not materializing unnecessary data.
- [x] #336
- [x] #343
- [x] #359
Fusion optimizations
Currenly we fuse map blocks operations with one input, but there are more types of fusion we could implement.
- [x] #337
- This should be done early on, so changes to fuse or other DAG manipulation tasks don't need to be done twice
- [ ] #288
- [x] #342
- For example, allow users to switch on more aggressive optimizations
- [x] #376
- [x] #136
- This change will fuse operations with multiple inputs
- [x] #366
- [x] #69
- [ ] Sibling fusion
- This change will fuse operations that share the same inputs
Reduction optimizations
Reduction operations like sum and mean could be improved by minimising the amount of data transferred.
- [x] #350
- [x] #365
- [x] #331
- [x] #284
- [ ] #418
High-level query optimizations
Re-writing array expressions to an optimized form (before applying the optimizations above).
- [ ] #333
Benchmarking and runtime
Testing the effect of the optimizations above.
- [x] #356
- [ ] #357
Documentation
- [x] #381
Do any of the caveats on the Scaling docs page need updating after all these improvements? e.g. it currently says
In theory multiple blockwise operations can be fused together, enhancing the performance further. However this has not yet been implemented in Cubed.
Do any of the caveats on the Scaling docs page need updating after all these improvements? e.g. it currently says
In theory multiple blockwise operations can be fused together, enhancing the performance further. However this has not yet been implemented in Cubed.
Yes, thanks for raising this. I've opened #381 to track documentation changes.