try to enable `parallel_attempts > 1` by default
It's sad (and slow) when parallelisation is off but a generator can support it.
Proposal:
- benchmarks show
parallel_attemptshas more impact thanparallel_requests(over a whole default run) - try to set this to be used by default - not all generators support it; we should have a flag in each generators that sets whether it should be run in parallel by default
- let's repurpose
parallel_capable - it's
Trueby default - [ ] needs to be set to False in ggml
- [ ] needs to be sensibly set to False or True for each Hugging Face class
- it should be possible to override
parallel_capablesomehow. options:- ignore
parallel_(requests|attempts)if generator'sparallel_capable == False - use
parallel_capableto determine default parallel configuration defaults, which can be overridden in CLI/config options (print/log a warning when this happens) - setting
parallel_(requests|attempts)to1should (effectively? actually?) override parallel_capable; print/log a notice when this happens
- ignore
- what's a sensible default for the rest class?
- what's a sensible default for
parallel_attempts? what's polite? I like 200 with NIMs, the SREs have comments about why this isn't good; 2 is too low. powers of two are popular - how about8?
- let's repurpose
This will expose more people to #308
@jmartin-tech opinions welcome!
I like the idea of expanding parallel_capable to limiting both parallel_requests & parallel_attempts for all plugins. I think there could be an argument for wanting to allow concurrent requests while not allowing concurrent attempts, the simplicity of a single control though may outweigh the value.
it should be possible to override
parallel_capablesomehow. options
In theory, parallel_capable can already be overridden via config on a per generator class basis. The impact would be to all instance of the generator class, and could have impacts for generators within other plugins like probes and detectors.
what's a sensible default for the rest class? what's a sensible default ...
Using a single default for all types seems viable, I don't see a strong reason to limit rest differently.
8 sounds reasonable, powers of 2 are nice from a cores/threads available perspective.
We could also consider an auto-detect value based on a CPU core count check and scale to 2x the CPU cores possibly with a max of 16 or something to keep it from over saturating on a high core count CPU.
A note to add here, this will add constraints on integrators using the function generator as it puts a requirement on the package providing the function to ensure the package provides inference with no configuration of setup required on import by a sub-process.
Good point, thanks for tracking it here. This generator is weird and somewhat unlike the others, so I'm OK with hardcoding a case around it alone.
This issue has been automatically marked as stale because it has not had recent activity. If you are still interested in this issue, please respond to keep it open. Thank you!
This issue has been automatically marked as stale because it has not had recent activity. If you are still interested in this issue, please respond to keep it open. Thank you!