dolma icon indicating copy to clipboard operation
dolma copied to clipboard

New Progress Bar, Backoff, Batching

Open soldni opened this issue 9 months ago • 0 comments

This PR adds three nice features to BaseParallelProcessor:

  • Refactors progress bar out of parallel.py
  • Adds a PoolWithDebug wrapper around multiprocessing.Pool that transparently disables multiprocessing when debugging
  • Uses backoff library to implement backoff and retries in case of failure
  • Ability to create parallel processor that work in batch mode (will change tokenizer processor later to use this new functionality)

soldni avatar May 23 '24 01:05 soldni