dolma
dolma copied to clipboard
New Progress Bar, Backoff, Batching
This PR adds three nice features to BaseParallelProcessor
:
- Refactors progress bar out of
parallel.py
- Adds a
PoolWithDebug
wrapper aroundmultiprocessing.Pool
that transparently disables multiprocessing when debugging - Uses
backoff
library to implement backoff and retries in case of failure - Ability to create parallel processor that work in batch mode (will change tokenizer processor later to use this new functionality)