datatrove icon indicating copy to clipboard operation
datatrove copied to clipboard

[BUG Fix] Launching dependent `LocalPipelineExecutor`s with `skip_completed=False` lead to waiting

Open silverriver opened this issue 1 year ago • 3 comments

When launching dependent LocalPipelineExecutor, using the flag skip_completed=False in previous executor will lead to the following exector wait forever.

For example:

executor1 = LocalPipelineExecutor(
    pipeline=[
            ...
        ],
    tasks=10,
    logging_dir=f"logs/tokz",
    skip_completed=False
)

executor2 = LocalPipelineExecutor(
    pipeline=[
            ...
        ],
    tasks=10,
    logging_dir=f"logs/tokz",
)

if __name__ == "__main__":
    executor2.run()

The above code snippet will lead to

datatrove.executor.local:run:102 - Dependency job still has 10/10 tasks. Waiting...

even if executor1 has finished all its jobs.

silverriver avatar Oct 30 '24 02:10 silverriver