gokart icon indicating copy to clipboard operation
gokart copied to clipboard

Introducing Task Run Locking for Enhanced Concurrency Control in Gokart

Open mski-iksm opened this issue 5 months ago • 0 comments

Introducing Task Run Locking for Enhanced Concurrency Control in Gokart

tl;dr

  • Introduces task run locking in Gokart for better concurrency control.
  • Prevents redundant task executions in distributed setups.
  • Updates and adds documentation for efficient multi-worker execution.
  • Implements backoff strategies for handling task lock exceptions.
  • Enhances efficiency and reliability of task execution in Gokart.

Summary

This pull request introduces significant updates aimed at enhancing the efficiency and reliability of running tasks on multiple workers in a Gokart/Luigi pipeline. Specifically, it adds new documentation on efficient multi-worker execution, updates task conflict prevention mechanisms, and integrates backoff strategies for handling task lock exceptions. These changes are designed to prevent redundant task executions and ensure more robust task locking in distributed environments.

Changes

  • Documentation Addition: Added a new documentation file efficient_run_on_multi_workers.rst that guides users on how to improve efficiency when running similar Gokart pipelines on multiple workers. This includes strategies to skip completed tasks and suppress the execution of tasks already being run by another worker.

  • Documentation Update: Updated the index.rst to include the new documentation in the User Guide section.

  • Task Conflict Prevention Lock: Renamed using_task_cache_collision_lock.rst to using_task_task_conflict_prevention_lock.rst to better reflect the mechanism's purpose. The documentation within has also been updated to align with the new naming convention and clarify the prevention of task cache conflicts.

  • Code Enhancements:

    • Modified gokart/build.py to include backoff strategies when encountering TaskLockException, allowing for automatic retrying with exponential backoff until a maximum number of tries or wait time is reached.
    • Updated task_lock.py and task_lock_wrappers.py to support the new locking mechanism during task execution (run method), ensuring that tasks are not executed redundantly across workers.
    • Added a new module wrap_run_with_lock.py to facilitate wrapping the task's run method with a lock, preventing simultaneous execution of the same task by multiple workers.
    • Adjusted gokart/task.py to automatically apply run locking based on task configuration, enhancing task execution efficiency in distributed environments.
  • Dependency Addition: Added backoff library to pyproject.toml and updated poetry.lock accordingly. This library is utilized to implement exponential backoff strategy when handling task lock exceptions.

Impact

  • Efficiency: These changes significantly reduce redundant task executions in distributed environments, lowering compute resource wastage.
  • Reliability: Enhances the reliability of task execution in concurrent scenarios by preventing task cache conflicts and ensuring that tasks are not executed more than necessary.
  • Usability: The addition of documentation provides clear guidance to users on how to leverage these new features, improving the overall usability of Gokart for distributed task execution.

Testing

  • Updated existing tests to reflect changes in task locking mechanism.
  • Added new tests to cover the functionality of retrying task execution with exponential backoff upon encountering lock exceptions.

Documentation

  • Added comprehensive documentation on efficient execution strategies on multiple workers.
  • Updated existing documentation to reflect the renaming and functionality changes in task conflict prevention.

mski-iksm avatar Feb 26 '24 02:02 mski-iksm