pash
pash copied to clipboard
Superoptimization
This issue is for tracking progress on the super-optimization component. This component solves the challenge of generating a script that is likely to exhibit the highest possible performance for the available script, input data, and broader environment. The script that meets these requirements may well be the sequential script — in which case, pash
needs to avoid delaying its execution at all costs. Keeping the fastest path fast will be challenging, because this component will incorporate some exploration of a configuration search space.
More discussion in the docs/superopt document.
I added a script that becomes slower when using PaSh.
I hypothesize that it is due to the addition of the auto-split and the fact that this script is very cheap CPU-wise. I think that there could be two ways to address this:
- Identify that this script is cheap (grep without backtracking, cut, wc -l) and not add an auto-split after the sed.
- Even better, we might be able to infer that sed 1d does not change the size of the input significantly, and therefore we can split using a custom batch size (instead of auto-split which has to write its input to storage first.